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Russian roulette 


Attempts to keep foreign interests out of Russian research will only suppress the exchange of 
information, and risk damaging East-West relations. 


produced some fine science. When it imploded, only a wave of 

foreign aid and philanthropy protected that excellent research 
base from collapse. The strategy worked: as individualism and entre- 
preneurship took hold in Russia, science regained its strength and 
started to look outwards — as any successful research endeavour must 
in the twenty-first century. 

Yet Russian President Vladimir Putin believes that his country can 
increasingly go its own way, and centralism and anti-Western rhetoric 
are on the rise. Science is beginning to suffer from paranoid state control. 

As we report on page 486, Russia has placed strict new rules on 
how its scientists can operate. In response to a recently amended law, 
Russian universities and research institutes have begun to instruct 
scientists to seek permission from the Federal Security Service before 
they submit papers or give talks at scientific conferences.. 

The wording of the law is vague, seemingly deliberately so. It 
effectively requires any work that is applicable to industry to be 
approved for publication. Russian scientists are rightly outraged by 
this return to inglorious Soviet practices. 

Meanwhile, dozens of organizations that receive foreign funding 
(and which the Russian government suspects are involved in “politi- 
cal activities” — again vaguely defined) are under scrutiny. Officially, 
this is to identify and repel unwelcome foreign influence. Unofficially, 
there is a whiff of political scores being settled. 

In May, the Dynasty Foundation, Russia’s largest private science- 
funding organization, shut down after the Ministry of Justice labelled 
it a “foreign agent”. Other philanthropic groups and foreign-funded 
foundations fear that they may soon find themselves on a list of 
“undesirable” organizations that the Russian parliament is drawing up. 

This is not the 1960s. Today, fear and isolationism can only damage 
collaborative science. In turn, this will undermine Russia’ efforts to 
modernize its struggling economy. Putin knows only too well that his 
country’s dependence on oil and gas exports is a treacherous anach- 
ronism as the world steers away from fossil-fuel use. Wisely, the gov- 
ernment has substantially stepped up its science funding in recent 
years. But neither a multibillion-rouble nanotechnology initiative, 
launched in 2007, nor attempts to create a number of world-class 
research universities and attract top Western scientists to Russian 
labs will bear fruit if fear and distrust continue to stand in the way of 
a liberal science culture. 

Russia's annexation of the Crimean Peninsula last year, and its dubi- 
ous role in the ongoing conflict in the rest of Ukraine, chilled East-West 
collaborations, in science and other fields. Russia’s controversial military 
involvement in the civil war in Syria, although cautiously tolerated by 
Western powers, threatens to cause further tension. 

Through large European research facilities such as the particle- 
physics laboratory CERN and the international nuclear-fusion project 
ITER, science can still offer a much-needed peaceful counterbalance 


D espite decades of intellectual isolation, the Soviet Union 


in these politically turbulent times. But a disturbingly anti- Western 
speech to the upper chamber of the Russian parliament by Putin's top 
science adviser on 30 September — the same day that Russia began its 
air strikes in Syria — testifies to the level of misunderstanding that is 
currently poisoning East-West relations across the board. 

The speech by Mikhail Kovalchuk, director of the Kurchatov 
Institute of nuclear science in Moscow and a key contact for many 
international collaborations, delivered a patently absurd account, 
riddled with lies and propaganda, of how 


‘A crackdown international science is a US plot to under- 
on academic mine Russia. Such anti-Western sentiments 
Vi reedom are readily echoed in Russia: last week, a 
and foreign high-ranking IT adviser to the government 
support will be said that Russia should stop training com- 


devastating.” puter experts because they will before long 
be serving Western interests. 

Making a bogeyman of the outside world — and in particular of 
the United States — is a populist political strategy intended to prepare 
the ground for anti-liberal isolationism. For Russia's scientific com- 
munity, a crackdown on academic freedom and foreign support will 
be devastating. Putin, who frequently expresses his appreciation of 
science, must see that investment alone is not enough. 

To pour cash into a system that stifles intuition, brilliance and truth 
will not help a nation that has always held scientists and explorers in 
great esteem. Even through difficult economic and political times, 
Russian science has produced a never-ending supply of great minds. 
It needs the freedom and respect to continue to do so. = 


Abstract thoughts 


Scientists, meeting organizers and the media 
must take care with preliminary findings. 


faint-hearted. Progress rests on honest appraisal of methods and 
results. Ideas must be challenged and conclusions defended. One 
of the most important transitions for any researcher is swapping the 
textbook scrutiny of the undergraduate years for critical and creative 
thinking. At the centre of this culture is the academic conference. 
Often the first chance for studies to be presented, discussed and 
criticized, these meetings are an important testing ground for early 
research. The community gets a heads-up on what others are doing, 
and how, and the scientists involved get some robust feedback that 
can shape their work. 


ik: rough and tumble of professional science is no place for the 
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Against such criteria, the presentation of preliminary data from a 
search for the genetic roots of homosexuality, ata meeting of the Ameri- 
can Society of Human Genetics in Baltimore, Maryland, earlier this 
month, was a success. So why does it feel as if something went wrong? 

In a ten-minute talk at the meeting, lead researcher Tuck Ngun 
described how his team scanned the DNA of 37 pairs of identical 
twins for chemical, or epigenetic, tags. They found a handful of simi- 
larities between many of the gay twins that were not present in their 
straight brothers. 

Epigenetic tags, which often regulate gene expression, can be both 
inherited and affected by environmental factors, as seems to be the 
case for homosexuality itself. The findings were preliminary, but the 
idea that epigenetics is involved in sexual orientation is certainly plau- 
sible, and the researchers hoped that their findings would stimulate 
future research. Most labs shy away from studying homosexuality 
because funders are reluctant to wade into the topic and because of 
the well-founded worry that findings will be used in the misguided 
search for a ‘cure. 

A flurry of press coverage ensued. Although some of the stories 
noted the study’s small sample size and need for replication — limi- 
tations that the researchers readily acknowledged — others were 
somewhat less than circumspect. ‘Have They Found the Gay Gene? 
Breakthrough in the US’, screamed the front page of one newspaper. 

Responding to the press coverage, many commentators took aim 
at the science — or at least what science was available in the 368-word 
conference abstract. The statistical analyses that the authors used are 
controversial, and there is a legitimate debate to be had. But short on 
hard information, the criticism turned into attack. 


A few critics went so far as to argue that the authors should not have 
presented such preliminary work at the meeting. And at least one sug- 
gested that the authors could have provided preprints of their study 
when presenting it. These arguments seem to misunderstand the tra- 
ditional, and still useful and relevant, role of such gatherings. Studies 

with small sample sizes and controversial meth- 


“Meeting sare ods are presented at conferences all the time, 
animportant and many scientists already fear being scooped 
testing ground when they present even a bit of their data. 

for early It is unlikely that most newspapers seek 


research.” science stories by meticulously scanning the 
abstract lists for foreign scientific conferences. 
It is much more likely that the wide coverage afforded to the epigenet- 
ics study arose because the story was presented to news desks ina press 
release from the conference organizers — and this is where there are 
lessons to learn. 

The press release, which was not seen or approved by all the scien- 
tists involved, was titled “Epigenetic Algorithm Accurately Predicts 
Male Sexual Orientation. It certainly added to the potential for the 
study to be misinterpreted. The organizers have pledged to reconsider 
how they select which conference talks to highlight before a meeting, 
and how press releases are approved. 

The genetics of homosexuality is a subject that will always 
find media coverage, partly because of the societal interest in 
the topic. Neither the scientists nor the conference organizers 
can be held responsible for how some in the media chose to write 
about the study. But both could have done more to get the right 
message across. 


Pick and mix 


Food regulators are right to place new forms of 
data on the safety menu. 


Turkmenistani beef chapattis — the aromas of the world’s tradi- 

tional foods mingle seductively along the mile of pavilions at Expo 
Milano 2015, this year’s world fair, dedicated to food. All delicious, 
but are they all safe? Will future foods be safe? Who is to judge — and 
on what evidence? 

In Europe, the European Food Safety Authority (EFSA) decides 
whether a new food can be marketed, and its job (like that of all similar 
regulatory agencies around the world) is getting tougher. Technologi- 
cal advances are creating ever more novel foods. 

The same technologies, along with the Internet and databases, 
have created more sources of information that may have a bearing on 
safety assessment: terabytes of molecular information from genomic 
or proteomic analyses, for example, or more-qualitative data generated 
through crowdsourcing. 

Public trust in EFSA’s decisions is patchy and, until now, the agency 
has been slow to engage with the problems and solutions that these 
technologies offer. But at a three-day conference in Milan — attached 
to the Expo, and concluding on 16 October, World Food Day — EFSA 
announced a new commitment to take on the modern challenges. As 
it does so, it can start to repair its rather undeserved reputation for 
non-transparency. 

Created in 2002 and based in Parma, Italy, the agency is proba- 
bly best known as the independent scientific advisory agency to the 
European Union, whose independent scientific advice on the safety of 
genetically modified (GM) cereals has been serially rejected by many 
EU member states. 

In most cases, EFSA’s science-based recommendations on the safety of 


[= chocolate, Bangladeshi samosas, Chilean cornbread flans, 
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new food products are accepted politically without too many questions. 
But the GM saga has encouraged a public distrust in its official scientific 
expertise. The scientific experts commissioned by EFSA over the years 
to analyse data on whether GM technologies or products are risky to 
health or the environment have seen their recommendations challenged 
time and again by protest groups that claim to have new data on dangers. 
As a one-off exception to the single-market rule, EU member states can 
decide on an individual basis whether they want to allow cultivation of 
a particular crop. Nineteen have registered their decisions to opt out, 
despite EFSA% seal of safety. 

EFSA does a good job of risk assessment and is reasonably trans- 
parent — but to stop distrust from seeping into all areas of its work it 
needs to do more. Risk assessment is a complicated science to convey 
to the public and is becoming even more complex with every new 
potential source of information. EFSA must be transparent about the 
exact data that it uses to make individual judgements and about the 
methods it uses to determine the degree of uncertainty around those 
judgements. It must also find ways to transparently assign appropriate 
weight to different data types that have been collected with varying 
degrees of scientific rigour. 

The agency is on the case. This year, it carried out a public consul- 
tation on the communication of uncertainties, and it is rolling out a 
toolbox of methods to be systematically tested over the next year. Such 
methods may address, for example, how to weigh up evidence gener- 
ated from computer modelling, from animal data generated in labs 
or from data gathered over social media — or how to assess whether 
a particular change observed in an organism is biologically relevant. 

By definition, risk assessment will never be able to deliver simple 
answers. And concerned citizens, rightly, will never place blind trust 
in scientific expertise. That is why transparency about both data 
sources and analysis methods is so important. Different people may 
even interpret the same complex data set dif- 
ferently. Citizens just need to be given a clear 
picture of how a risk assessor has interpreted 
data — so that they can challenge or accept the 
final decision of the risk manager. m 
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grey all day — choked with pollution from the massive fires that 

rage across the Indonesian island. Since the late 1990s, the haze 
caused by these annual fires has posed a significant threat to the health 
of Sumatra’s rural communities. This year’s haze is especially bad and 
has affected major cities, both here and abroad; consequently, the fires 
have again made headlines around the world. 

Many of these news stories blame the big palm-oil companies for 
the fires. Slash-and-burn techniques remain the cheapest way to clear 
forest for new plantations. But scientific evidence suggests that this 
simple narrative is not absolutely true. A number of surveys have 
found that the bulk of these fires are started 
outside the official oil-palm concessions. Small- 
scale farmers seem to be more to blame. 

The haze in Indonesia is not just an environ- 
mental issue; it is a complex socio-economic 
problem that is driven partly by conflict over land 
ownership between palm-oil companies and rural 
communities — a struggle that the companies 
usually win. 

Besides holding financial and legal power, these 
companies also have science on their side. High- 
quality research at state-funded centres has found 
ways to increase the production of palm oil, such 
as the manipulation of the gene SHELL and ways 
to weed out oil-palm clones with reduced yields. 
These technologies have been developed by the 
Malaysian Palm Oil Board, and the big companies 
in the region can pay to license and use them. But 
such technologies are out of the reach of small- 
holders and the rural population. Yet smallholders 
produce a large proportion of the crops, mainly through conventional 
farming practices. 

Some 80% of Indonesian rubber, for example, is made by small-scale 
farmers who do not have access to the research products and whose wel- 
fare has not improved. What has science done to empower these people? 

The problems of Indonesian farmers might seem low on the list of 
global priorities. But as the nations of the world prepare to discuss a 
treaty on climate change in Paris next month, the fires that fuel the 
Sumatran haze offer a perfect example of how the relationship between 
science and industry must shift if we value sustainable development. 

Scientists need the private sector to provide funding and a ‘tunnel’ 
for commercialization; the private sector needs scientists to develop 
products. This alliance, together with support from the government, is 
called the triple helix — a concept that has driven 


Te sun has been pale for months here in Sumatra and the skies are 


the world’s economy since the Industrial Revolu- NATURE.COM 
tion. But is this concept still relevant? Discuss this article 
Although some parts of the world have _ onlineat: 


achieved a stable economy driven by scientific —_go.nature.com/gzjfim 


MITIGATION 


MUST BE THE 
RESPONSIBILITY OF 


EVERYONE 


ON THE PLANET, 
NOT JUST 


SCIENTISTS, 


BUSINESSMEN AND 
POLICYMAKERS. 


_ Indigenous peoples must 


benefit from science 


To drive sustainable development, Dyna Rochmyaningsih argues, science 
must empower rural communities — not just serve industry and governments. 


advancement, around half of the world’s population still lives in poverty. 
The people of these regions also face environmental threats, such as 
deforestation and its extended impact, on a daily basis. Those who are 
most vulnerable benefit from science the least. 

There are scientists who want to transfer their knowledge to these 
people, but this has proved difficult. The failure of an experiment in the 
Solomon Islands to help indigenous people to exploit their local envi- 
ronment as ‘ecosystem services’ was attributed to a culture gap between 
scientists and local people. This claimed divide is often presented as a 
barrier to the transfer of science and technology. 

Scientists must try harder to bridge this gap. Science is a fuel for 
economic development, but its influence must 
extend beyond the triple helix. That model simply 
uses science to exploit natural resources for eco- 
nomic gain. Given the need to mitigate the harm- 
ful environmental effects of this conversion, the 
model is no longer enough. 

Mitigation must be the responsibility of every- 
one on the planet, not just scientists, businessmen 
and policymakers. Indigenous and local people 
should also be involved, especially those who call 
carbon sinks, such as tropical forests, home. 

There are already examples of science reaching 
out. The residents of the Wanang Conservation 
Area in Papua New Guinea, for instance, have 
offered 1,000 hectares of their 10,000-hectare 
protected forest for research conducted by insti- 
tutions such as the Smithsonian Tropical Research 
Institute’s Center for Tropical Forest Science. In 
this zone, scientists and indigenous people col- 
laborate to investigate the response of trees to cli- 
mate change. Local people are trained then employed as field research 
assistants and have received compensation for the lease of their forest. 

Meanwhile, a project supported by the US Agency for International 
Development is training local people in West Kalimantan, Indonesia, to 
be plant parataxonomists. The project was initiated by Campbell Webb, 
aplant evolutionary biologist and bioinformaticist at the Arnold Arbo- 
retum of Harvard University who is based in West Kalimantan. It is 
teaching local people to collect plant data in Gunung Palung National 
Park, an area of high biodiversity that faces the threat of deforestation. 

The Paris talks should discuss the need for such initiatives to be cop- 
ied and scaled up. For decades, the relationship between science, indus- 
try and government has been celebrated by all involved as a good thing. 
But not everybody benefits. Science might be able to pin the blame for 
the southeast Asia haze on Indonesian smallholders, but it has not yet 
given them — or others in their position — a way to help prevent it. m 


Dyna Rochmyaningsih is a freelance science journalist in Sumatra. 
e-mail: drochmya87@gmail.com 
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RESEARCH HIGHLIGHTS 


Droplets surf 
graphene waves 


Tiny particles of liquid move 
quickly across thin layers of 
carbon by ‘surfing’ waves that 
ripple through the sheets. 

Angelos Michaelides at 
University College London and 
his colleagues used computer 
simulations to investigate 
how liquids move across 
graphene — a layer of carbon 
one atom thick. Graphene has 
wave-like ripples that transport 
nanometre-scale droplets of 
water and oil, and even ice 
particles. This happens because 
the particles are attracted to the 
high density of carbon atoms 
in the wave trough. These 
nanodroplets move much more 
quickly on flexible layers of 
material such as graphene than 
on rigid materials like metal. 

If validated by experiments, 
this mechanism could be used 
to control the delivery of water- 
soluble drugs on surfaces 
coated with a layered material, 
the authors say. 

Nature Mater. http://dx.doi. 
org/10.1038/nmat4449 (2015) 


Bionic touch 
lights up neurons 


A thin, flexible device can sense 
a wide range of pressures and 
produces signals that stimulate 
nerve cells ina dish. 


Zhenan Bao of Stanford 


Pluto hosts wildly varying terrain 


The first published findings from NASA’s 
New Horizons mission to Pluto confirm that 
the dwarf planet has geological features that 
resemble those found on Mars and various 


moons in the Solar System. 


NASAs spacecraft flew past Pluto in July, 
sending back reams of data that have been 
analysed by Alan Stern at the Southwest 
Research Institute in Boulder, Colorado, and 
his colleagues. Broad, bright plains on Pluto 


University in California and 
her collaborators embedded 
carbon nanotubes in a rubbery 
polymer and attached that 
material to a flexible circuit 
(pictured mounted ona 
robotic hand). The device 
mimicked the response of 
touch-sensitive nerve cells in 
the skin by emitting discrete 
electrical spikes of increasing 
frequency in response to 
applied pressure. The team 
converted the electronic signal 
into light that then stimulated 
genetically engineered, 
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known as Sputnik Planum seem to be covered 
by nitrogen glaciers; these quickly erase craters 
made by crashing asteroids. Nearby lies the dark 
Cthulhu region, which is covered in craters that 


are thought to be up to 4 billion years old. 


light-sensitive mouse neurons 
in vitro. 

Such artificial skin could 
one day restore sensation for 
people wearing prostheses, the 
authors say. 

Science 350, 313-316 (2015) 


Caffeine keeps 
bees coming back 


Caffeine-infused nectar tricks 
honeybees into changing their 
foraging behaviour in ways 


Pluto also hosts unique features, such 
as ‘snakeskin’ terrain that may have been 
sharpened into ridges over time as material 
froze and then sublimated away. 
Science 350, 292 (2015) 


that may benefit the plant. 
Many plants produce the 
bitter-tasting caffeine to deter 
herbivores, but also rely on 
bees to spread their pollen 
for reproduction. To look at 
caffeine’s effect on pollinators, 
Margaret Couvillon and her 
colleagues at the University 
of Sussex near Brighton, 
UK, monitored honeybees 
feeding from a sugar solution. 
They then compared the 
bees’ behaviour to those 
feeding on the same solution 
but with caffeine added at 


NASA/JHUAPL/SWRI 


BAO RESEARCH GROUP, STANFORD UNIV. 
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a concentration found in 
nectar. The caffeine-fuelled 
bees revisited the feeders 
more frequently than did the 
control bees, and they at least 
tripled the number of waggle 
dances they performed to 
recruit bees from the hive. 
Because caffeine disguises a 
reduced sugar concentration, 
the nectar the bees take back 
to the hive might be sub- 
standard. That could mean 
that the colony would produce 
less honey, the authors 
predict. 
Curr. Biol. http://dx.doi. 
org/10.1016/j.cub.2015.08.052 
(2015) 


ANIMAL BEHAVIOUR 


Electric eels use 
shocks to sense 


Electric eels send out strong 
zaps to track moving prey by 
their electrical conductivity, 
enabling the eels to strike with 
remarkable precision. 

Electric eels (Electrophorus 
electricus; pictured) are known 
to use electricity to stun their 
prey, and have electrical 
sensors (pictured in pink). To 
see whether the high-voltage 
zaps have a sensory role, 
Kenneth Catania at Vanderbilt 
University in Nashville, 
Tennessee, presented the eels 
with a twitching fish in an 
insulated plastic bag anda 
conductive rod. 

The eels reacted to the 
mechanical signals from the 
moving prey, producing a 
strong shock and striking in 
the direction of the fish. But 
they repositioned mid-strike, 
capturing and attempting to 
feed on the rod instead, even 
when it moved around quickly. 

This sensory system is 


similar to how some bats use 
echolocation, says Catania. 
Nature Commun. 6, 8661 (2015) 


AGROECOLOGY 


Wild flowers area 
pesticide source 


Commonly used insecticides 
have been found on wild 
flowers as well as on crops. 

Neonicotinoid pesticides 
applied to the seeds of some 
crops end up in the nectar 
and pollen of adult plants, so 
the chemicals are a suspected 
cause of the global decline in 
bee populations. Because most 
crops flower only briefly, it 
was unclear how bees could be 
exposed to enough pesticide to 
feel toxic effects. Now Cristina 
Botias and her colleagues 
at the University of Sussex 
in Brighton, UK, show that 
these chemicals are present 
in the pollen of wild flowers 
growing near fields where 
neonicotinoids were used. 

The team measured 
neonicotinoid levels in pollen 
sampled from fields of oilseed 
rape (Brassica napus), nearby 
wild flowers and local beehives, 
and estimated that 97% of these 
compounds that were brought 
back to beehives originate from 
wild flowers. 

The wild flowers had higher 
levels of insecticide in their 
pollen than crop plants did, and 
they bloom for much longer. 
Environ. Sci. Technol. http://doi. 
org/8bk (2015) 


Cheap MRI uses 
small magnets 


A technique for magnetic 
resonance imaging (MRI) 
could provide fast brain scans 
at a fraction of the cost of 
conventional machines. 
Most MRI scanners 
require large magnets 
to generate a strong 
enough magnetic field to 
penetrate soft tissue. A team 
led by Matthew Rosen at 
Harvard Medical School 
in Boston, Massachusetts, 
has demonstrated a 
way to capture an image 


RESEARCH HIGHLIGHTS BiiSaiiaa¢ 


SOCIAL SELECTIO 


Popular topics 
on social media 


Acall for preprints at meetings 


In what has been called “gaygenegate” in some corners of the 
Internet, a conference presentation on 8 October about the 
genetics of homosexuality in men has come under intense 
scrutiny. The talk also prompted questions about whether 
scientists working on controversial topics should post 
unreviewed preprints of their findings before presenting 
them at a meeting. Statistician Andrew Gelman of Columbia 
University in New York, who criticized the homosexuality 
study’s statistical analysis, wrote in a blog post that the lack 
of a peer-reviewed paper or preprint 


> NATURE.COM 
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using magnetic fields that are 
450 times weaker than those 
used by current machines, and 
at one-twentieth of the cost. 

The team engineered a 
radio-frequency coil that 
could pick up the faint 
radio signals generated as 
a result of the weak magnet 
and used data-collection 
techniques that speed up image 
reconstructions. 

Although the resulting 
images have a lower resolution 
than do those from large MRI 
scanners, they can still reveal 
major abnormalities such as 
signs of traumatic brain injury 
or stroke, Rosen says. 

Sci. Rep. 5, 15177 (2015) 


Village-dog DNA 
hints at origins 


DNA from free-roaming 
‘village dogs’ shows greater 
genetic diversity than that 
of pure-bred dogs, and 
could help to settle debates 
about where dogs were 
domesticated. 

Humans domesticated 
dogs from wolves more 
than 15,000 years ago, but 
researchers disagree about 
whether that happened in 
Europe, East Asia, the Middle 
East or elsewhere. A team led 
by Adam Boyko at Cornell 
University in Ithaca, New 
York, analysed the genomes 


made it difficult for people to evaluate 
the work. Other researchers countered 
that conferences are meant to be 
forums for early, unpublished work. 


of 549 free-breeding village 
dogs from around the world, 
as well as 4,676 pure-bred 
dogs belonging to 161 breeds. 
Genome-wide patterns of 
ancestry in the village dogs 
hint at a central Asian origin 
for domestic dogs, followed 
by population expansions in 
East Asia. 

The researchers say, 
however, that more-extensive 
studies of DNA from diverse 
dogs are needed to pinpoint 
the origins of man’s best 
friend. 

Proc. Natl Acad. Sci. USA 
http://dx.doi.org/10.1073/ 
pnas.1516215112 (2015) 


CORRECTION 

The print version of the 
Research Highlight ‘Corals 
cope with acidified waters’ 
(Nature 526, 296-297; 
2015) incorrectly stated that 
ocean water is being acidified 
when in fact it is becoming 
less alkaline; the online title 
was changed to reflect that. 

It also said coral-made fluid 
was less acidic than reef 
waters; in fact, the fluid had 

a higher pH. And it said that 
some corals can control the 
pH of surroundings, whereas 
they control their internal pH. 


> NATURE.COM 
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SEVEN DAY 


Pp PEQPLE 
Marcy resigns 


Astronomer Geoffrey 

Marcy resigned from the 
University of California, 
Berkeley, on 14 October, 
following revelations that he 
had violated his university's 
sexual-harassment policies. 

In response, the American 
Astronomical Society is 
updating its code of ethics 

to include guidelines and 
practices for dealing with 
misconduct. Marcy, a pioneer 
in the field of exoplanets, has 
also terminated his relationship 
with the Breakthrough 

Listen project to search for 
extraterrestrial intelligence, and 
been removed from an adjunct 
position at San Francisco State 
University. See page 483 for 
more. 


EVENTS 


New Ebola cases 

The World Health 
Organization (WHO) reported 
two new cases of Ebola in 
Guinea on 16 October, ending 
a two-week period in which no 
new cases had been detected 
across West Africa. Contacts of 
both individuals will receive an 
experimental Ebola vaccine as 
part of an ongoing clinical trial. 
The WHO does not consider a 
region Ebola-free until 42 days 
have passed without a new case. 


Harvard and China 
Harvard University in 
Cambridge, Massachusetts, 
has unveiled a collaborative 
environmental research 
project in China. The US$3.75- 
million venture, announced 
on 15 October, will enable 
atmospheric scientist Michael 
McElroy to work with Chinese 
researchers on climate 

change, energy security and 
sustainable development. 
Based at the Harvard Center 
Shanghai, it will include 
studies in economics, 


The news in brief 


Pollen-coated honeybee photo wins gold 


This extreme close-up of a honeybee (Apis 
mellifera) eye covered in dandelion pollen 
grains won the annual Nikon Small World 
Photomicrography Competition. The contest 
showcases microscopic images captured 
worldwide by scientists, artists and others. 
Australian secondary-school teacher Ralph 


engineering, atmospheric 
science and environmental 
health related to sustainability. 
The collaboration is the first 
initiative of the Harvard Global 
Institute, a funding mechanism 
launched the same day, to 
encourage interdisciplinary 
collaborative research 
overseas. 


Whaling fight 
Australia’s environment 
minister said on 19 October 
that the country is taking legal 
advice over Japanese plans to 
restart whaling in the Southern 
Ocean, claiming that Japan is 
attempting “to exclude itself 
from the International Court 
of Justice in matters relating 

to future whaling activities”. 
The court declared Japan's 
whaling in the region illegal 

in 2014, ruling that it was not 
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strictly for scientific purposes. 
Minister Greg Hunt said that 
Australia had met with the 
Japanese government to discuss 
the latter’s apparent attempt to 
sidestep the court. 


Oil-industry pledge 
Officials from ten of the 
world’s largest oil and gas 
companies, including BP and 
Shell, endorsed on 16 October 
the goal of limiting the increase 
in average global surface 
temperatures to 2°C. While 
calling on governments to 
create “clear stable policy 
frameworks’, the companies 
committed to increasing 
investments in clean-energy 
solutions, including energy 
efficiency and technologies 
that would allow carbon 
dioxide from industrial plants 
to be captured and sequestered 


Grimm, a former beekeeper, snapped the photo, 
devoting four hours to mount and light the eye 
and focus the image. Second prize went to an 
image of a mouse colon colonized with human 
microbiota, and in third place was a picture of 
the trap of a humped bladderwort (Utricularia 
gibba), a freshwater carnivorous plant. 


underground. Released less 
than two months before 

the United Nations climate 
summit in Paris, the statement 
was met with scepticism 

from environmentalists, who 
say that the industry is still 
fighting meaningful climate 
regulations. 


WHOI cyberattack 
The Woods Hole 
Oceanographic Institution 
(WHOI) in Massachusetts 
told its staff on 13 October 
that it had been the target ofa 
cyberattack. An investigation 
suggests that the attack 
originated in China. The 
institution does both classified 
and unclassified oceanographic 
research; no classified 
information was accessed 
during the breach, it says. The 
attack, which targeted data 


RALPH CLAUS GRIMM 


GEORGE STEINMETZ/CORBIS 


SOURCE: M. R. MACLEOD ET AL. PLOS BIOL. HTTP://DOI.ORG/8CF; 2015 


and e-mail, began as early as 
February 2013 and was not 
detected until June 2015. 


Canada leans left 
The Liberal Party triumphed 
over the Conservatives in 
Canada’s 19 October general 
election. Incoming prime 
minister Justin Trudeau 
pledged before the election to 
appoint a chief science officer 
to make government science 
“fully available to the public”. 
The previous Conservative 
government attracted large- 
scale protests from researchers 
who accused its leader Stephen 
Harper of muzzling federal 
scientists and cutting research 
budgets. See go.nature. 
com/27d1td for more. 


POLICY 


Halt to Arctic oil 
The US Department of 

the Interior on 16 October 
cancelled a pair of oil and 

gas lease sales in the Arctic, 
citing low oil prices and 
weak industry interest. The 
decision comes after the oil 
company Shell suspended its 
Chukchi Sea Arctic drilling 
programme in September 
after finding less oil and gas 
than expected. The lease sales 
had been planned for 2016 in 
the Chukchi Sea and 2017 in 
the Beaufort Sea (pictured). 
The department also denied 
requests from Shell and 
Norwegian oil company 


TREND WATCH 


Drug testing in animals is at 
substantial risk of bias because 

of poor study design, suggests an 
analysis of thousands of papers 
(M. R. Macleod et al. PLoS Biol. 
http://doi.org/8cf; 2015). Many 
publications do not mention bias- 
avoiding methods. These include 
randomizing animals’ assignment 
to treatment or control arms; 
calculating the sample size 
necessary for a statistically robust 
result; and ‘blinding’ researchers 
as to which animals were assigned 
which treatment. See go.nature. 
com/j7ipin for more. 


Statoil to put their existing 
Arctic leases on hold and 
resume them at a later date. 


Oversight overhaul 
The US National Institutes of 
Health (NIH) may scale back 
its review of research involving 
human gene therapy. The 
agency's Recombinant DNA 
Advisory Committee reviews 
all research protocols involving 
gene transfer into humans, 

but on 16 October, the NIH 
proposed that such research be 
reviewed only if requested by 
local ethics committees. A 2013 
report from the US Institute 

of Medicine argued that the 
current level of oversight is 

no longer needed. The public 
is invited to comment on the 
proposal until 30 November. 


Emissions action 
The White House announced 
a series of executive actions 
and voluntary industry 
commitments on 15 October 
to reduce emissions of 
hydrofluorocarbons (HFCs), 


a group of potent greenhouse 
gases that are commonly 
used as refrigerants. The US 
Environmental Protection 
Agency said that it will 
pursue new rules governing 
the use and management of 
HFCs, and the Department 
of Defense announced plans 
to use alternative chemicals 
at some of its facilities and 

on ships. Combined with 
earlier announcements, the 
commitments aim to reduce 
global greenhouse-gas 
emissions by the equivalent of 
more than 1 billion tonnes of 
carbon dioxide by 2025. 


Wellcome boost 
The Wellcome Trust, Britain’s 
largest biomedical-research 
charity, announced plans on 
21 October to spend £5 billion 
(US$7.7 billion) over the next 
five years. The organization 
has disbursed £6 billion over 
the past 10 years, including 
£728 million in 2014. Jeremy 
Farrar, the trust’s director, 
says that Wellcome “will do 


ANIMAL STUDIES POORLY DESIGNED 


Studies testing drugs in animals rarely report the use of basic methods 


to avoid biased conclusions. 
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SEVEN DAYS | THIS WEEK | 


23-24 OCTOBER 
The University 

of Pittsburgh in 
Pennsylvania hosts 

a conference on the 
biology and control of 
nausea and vomiting. 
emesis2015.com 


25-28 OCTOBER 

Is time travel possible? 
Scientists gather in 
Turin, Italy, to debate 
causality and non- 
locality in physics, 

in relation to time 
machines. 
www.timemachinefactory.eu 


25-29 OCTOBER 
Molecular and cellular 
biologists meet in Kyoto, 
Japan, for a Keystone 
Symposium on 
molecular mechanisms 
and treatment strategies 
for diabetes. 
go.nature.com/we3huv 


more of what we're already 
doing”, and identified other 
areas such as combating 
drug-resistant infections and 
research that involves mining 
medical records. The splurge 
is driven by the performance 
of Wellcome’s £18 billion 
endowment fund, says 
Farrar. From 2013 to 2014, 
its investment assets grew by 
about 10%, to £18 billion. 


CORRECTION 

The story ‘Telescope start’ 
(Nature 526, 298; 2015) 
stated that the 23-metre 
telescope being built in 

the Canary Islands would 
form part of the Cherenkov 
Telescope Array. However, 
the telescope is a prototype 
and will not necessarily 
become part of the array, 
which has yet to be finally 
approved or funded. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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Exoplanet-hunter pioneer Geoffrey Marcy has resigned from his job and some of his research projects. 


SOCIETY 


Astronomy moves 
to end harassment 


US researchers unite against abuse in the workplace. 


BY ALEXANDRA WITZE 


r | “twenty years ago this month, Geoffrey 
Marcy narrowly missed out on becom- 
ing the first astronomer to find a planet 

orbiting a distant star. Beaten to the discovery 

by a pair of Swiss scientists, Marcy and his 
colleagues went on to rack up an astonishing 
list of other extrasolar planet sightings, from 
the first multiplanet system around a Sun-like 
star to the first Neptune-sized exoplanet. It 
was the kind of career that triggered talk of 

a Nobel prize. 

On 14 October, Marcy resigned from the 


University of California, Berkeley, in the wake 
of a sexual-harassment scandal that involved 
multiple students over many years. He leaves 
a field that has expanded far beyond his early 
influence — and many researchers who hope 
that the harassment revelations will lead to 
improvements in working conditions for 
women in astronomy. 

Having such a prominent researcher 
involved in a high-profile harassment case may 
prompt more scientists to recognize the deep- 
seated problem, says Julianne Dalcanton, an 
astronomer at the University of Washington in 
Seattle. “Perhaps moving forward, more people 


will be part of the solution,” she says. Already, 
astronomy departments at many universities 
are starting to hold open discussions about 
how to prevent abuses on their campuses. 

“The damage he has caused and the culture 
which enabled it to happen still need to be 
addressed,’ adds Laura Lopez, an astronomer 
at the Ohio State University in Columbus. 

Statistics paint a grim picture for US women 
in astronomy. Just 14% of full professors in the 
field at US universities are women, according 
to a 2013 survey by the American Astronomi- 
cal Society (AAS) Committee on the Status of 
Women in Astronomy. Studies suggest that 
sexual harassment is pervasive in academia. 
In an April 2015 survey of students, faculty 
members, staff and alumni in Marcy’s depart- 
ment at Berkeley, more than one-third of 45 
women reported some form of sexual or gen- 
dered discomfort brought on by the actions of 
other members of the department. 


SCANDAL ERUPTS 

Prompted by formal complaints, Berkeley 
investigated Marcy and concluded in June that 
he had violated campus sexual-harassment pol- 
icies in incidents involving students between 
2001 and 2010. (Marcy became a professor at 
the university in 1999.) The revelations became 
public in a 9 October article on BuzzFeed News. 
Berkeley administrators said that because they 
could not unilaterally discipline a faculty mem- 
ber, they had reached an agreement with Marcy 
in which he would be stripped of faculty career 
protections and be subject to sanctions or dis- 
missal if he violated policies again. 

Astronomy faculty members and students 
at Berkeley protested against the university's 
response, and Marcy resigned. San Francisco 
State University, where he worked before 
Berkeley and had retained an adjunct position, 
terminated its relationship with him. 

The fate of Marcy’s research projects remains 
unclear. That includes his work helping to 
lead NASAs planet-hunting Kepler mission, 
and with the Automated Planet Finder, a 
robotic 2.4-metre telescope at Lick Observa- 
tory in northern California that searches for 
rocky planets. Marcy has resigned as a prin- 
cipal investigator of Breakthrough Listen, a 
US$100-million project announced in July to 
accelerate the search for signs of intelligent life 
in the Universe. The mission will continue to 
be overseen by Berkeley astronomers Andrew 
Siemion and Dan Werthimer, among others. 
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| NEWS IN FOCUS 


> Although Marcy was a pioneer in exo- 
planet research, the field has grown far beyond 
him, says Mercedes Lépez- Morales, an astron- 
omer at the Harvard-Smithsonian Center for 
Astrophysics in Cambridge, Massachusetts. 
She hopes that the quick and nearly universal 
condemnation of Marcy’s actions will encour- 
age young research- 
ers, particularly 


“We should all 

women, to pursue 
take a close 

exoplanet research. 
look at our own “Marcy represents 
ms titutions ‘ibe the exception, not the 
professiona rule, in our field,” says 
networks Lépez-Morales. 
and ask what Among those 
we might do swift responses was 


differently.” 


a statement from the 
AAS. The Marcy case 
“offers an important opportunity for all of us 
to discuss, within our groups and institutions, 
what responsibilities we have as professionals 
and how we can ensure that everyone in our 
profession is afforded a safe, supportive work- 
place’, it reads. AAS president Meg Urry, an 
astronomer at Yale University in New Haven, 
Connecticut, is a long-time advocate for 
improving working conditions for women. 
After the Marcy revelations, Urry set up a task 
force to develop procedures and sanctions 
related to misconduct, for inclusion in the soci- 
ety’s code of ethics. 

“We should all take a close look at our 
own institutions and professional networks 
and ask what we might do differently,” says 
Heather Knutson, an exoplanet researcher 
at the California Institute of Technology in 
Pasadena. 

Compared with other fields of science 
and other countries, US astronomy has been 
relatively progressive in tackling workplace 
issues for women and other minorities. The 
AAS Committee on the Status of Women in 
Astronomy runs a website with discussion and 
specific advice on topics such as bullying. Vol- 
unteers have also started a programme called 
Astronomy Allies, which serves as a buddy 
system to walk people home from astronomy- 
related parties and conference events. 

Other research areas should also pay atten- 
tion, Lopez says. “Sexual harassment is a prob- 
lem endemic to all fields in academia; she says, 
“and Marcy’s case should serve as a reality 
check for everyone, not just for astronomers.” a 


> 


MORE 
ONLINE 


BY ERIKA CHECK HAYDEN 


and high salaries are well-known hall- 

marks of the Googleplex — Google's 
famed headquarters in Mountain View, 
California. But it was not these perks that led 
cardiologist Jessica Mega to pause her thriving 
academic career at Harvard Medical School to 
become the chief medical officer of the com- 
pany’s life-sciences team. She was lured by 
the ambitions of the effort, soon to be incor- 
porated under Google’ parent firm Alpha- 
bet. Nurtured by Google’s expertise in data 


f ree tasty food, brightly coloured bicycles 


| MORE NEWS | 
Pluto’s @ Vast cosmic voids merge like soap 
geology is bubbles go.nature.com/fjdgoc 
unlike any @ First report of sexually transmitted 
other in the Ebola go.nature.com/svhmth 
Solar System | @ Teeth from China reveal early 
go.nature. human trek out of Africa go.nature.com/ 
com/7pmbnp mevigq 


484 | NATURE | VOL 526 | 22 OCTOBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


FERRE STIR I 


Google 
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Google provides colourful bikes for employees to ride around its California campus. 


BIOTECHNOLOGY 


Tech titans lure 
life-sciences elite 


As Google and others turn to health care, biomedical 
luminaries flock to Silicon Valley. 


analytics and engineering, the biology team 
is expected to create miniaturized electronic 
devices and to use these and other means to 
collect and analyse more health data, more 
continuously, than is possible today. 

“What I find compelling is the immer- 
sion of people with strong technology back- 
grounds — hardware and software engineers 
— sitting next to people like myself?” says 
Mega. “The impact feels very, very large” 

Mega’s decision to move in March to 
Google was one ina string of announcements 
by top-flight scientists and physicians who 
are enlisting in the mission, and pioneering 


NATURE PODCAST 


A dying solar 
system like ours; 
how heat affects 
economies; and 
electricity-eating 
bacteria nature.com/ 
nature/podcast 


nature podcast 


JUSTIN SULLIVAN/GETTY 


a new type of career path in the process. 
Although academic researchers from fields 
such as computer science and engineering 
have led innovative Google projects (such as 
the Internet-connected eyewear known as 
Glass), Google and other technology com- 
panies are increasingly recruiting life scien- 
tists as Silicon Valley broadens its reach into 
health care. “I have a feeling we're going to see 
alot more recruitment of leading lights,” says 
Eric Topol, director of the Scripps Transla- 
tional Science Institute in La Jolla, California. 

In September, Thomas Insel, director of 
the US National Institute of Mental Health 
in Bethesda, Maryland, announced that 
he would soon be joining Google’s life- 
sciences company to help develop ways to 
apply technology in mental health. And last 
year, molecular biologist Cynthia Kenyon, a 
leader in ageing research at the University of 
California, San Francisco, joined the Google- 
backed biotech company Calico in San Fran- 
cisco, California. 

Cardiologist Euan Ashley of Stanford 
University, which sits in the thick of Silicon 
Valley, says that academic data scientists are 
constantly tempted by the companies that 
await them just off campus. “They're being 
continuously recruited away,’ he says. “We're 
in competition with Google and other tech 
companies, and generally they can pay a lot 
more than Stanford can” 

But money is not the only lure. Silicon 
Valley offers strong technology resources that 
are hard to access in academia, Topol says, as 
well as the opportunity to pursue goals that 
are difficult to reach for in academia, where 
scientists are not typically rewarded for pur- 
suing real-world applications. “The resources 
are exponentially greater than what you can 
get through academic circles. And the met- 
rics are different: instead of publications, it’s 
just, ‘Get stuff done’” he says. 

Getting stuff done was foremost in the 
mind of electrical engineer Brian Otis when 
he left his tenured position at the University 
of Washington in Seattle in 2012 to work for 
Google. He went there to work on a ‘smart’ 
contact lens for people with diabetes that 
measures the level of glucose in tears. When 
the project began, it faced two big questions: 
first, could the electronics needed to make a 
functional wireless glucose sensor be embed- 
ded in a wearable contact lens? And second, 
would it provide the relevant measurements 
of glucose levels? The motivation and means 
to answer those unknowns was a powerful 
incentive, Otis says. He recalls thinking: “IfI 
come into Google life with these questions, 
I have the entire runway and resources to 
answer these two questions.” 

The project was successful; drug giant 
Novartis licensed the contact-lens technology 
last year and Otis is now director of the Google 
life-sciences team’s hardware and medical- 
device development. “To go all the way from 


foundational first principles to execution of 
vision was the initial draw, and that’s what has 
continued to keep me here,’ he says. 

Apple, too, has entered the health-care 
game. In March, it debuted ResearchKit, a 
framework through which researchers can 
write apps that collect data from patients’ 
mobile phones. And in April, IBM launched 
IBM Watson Health and the Watson Health 
Cloud, services that use the company’s cogni- 
tive computing technology to process large 
amounts of health data from diverse sources. 
The service could help physicians to man- 
age patients’ health by streaming data from 
personal electronic devices, or enable drug 
companies to manage clinical trials more 
efficiently with cloud computing. Intel, 
meanwhile, is developing cloud-computing 
services to provide more personalized cancer 
care; and Facebook, Microsoft and Amazon 
are all also getting involved. 

But Google’ approach sets it apart: the com- 
pany expends more resources on potential 
health applications 


“The resources and is exploring 
are exponentially in more directions 
greater than than others are. 
what you can Observers estimate 
get through that Google puts 
academic more than a billion 


circles.” dollars per year 
into life-sciences 
research, although the company says that it 
does not break down its spending in that way. 

Google's life-sciences team is working on 
a range of projects that involve developing 
new ways of monitoring health. As well as 
the smart contact-lens project, there is the 
Baseline Study, which aims to collect large 
amounts of data about people to better quan- 
tify health and disease, with the goal of ear- 
lier and more-effective preventive care. The 
company also funds a huge array of exter- 
nal collaborations with academics. Google 
Genomics, for instance, is studying the appli- 
cation of cloud computing to genomics, and 
Calico has signed a slew of collaborations 
with companies and academic institutes. 

“They're reaching out to academia in a way 
that biotechnology companies often don't,” 
says cell and molecular biologist Judith 
Campisi of the Buck Institute for Research 
on Aging in Novato, California. That enables 
scientists to collaborate with Google instead 
of joining it wholesale. 

“For some academics, joining a technology 
company would be an exciting new oppor- 
tunity,” says physician Steven Hyman of the 
Broad Institute of MIT and Harvard in Cam- 
bridge, Massachusetts. But it is “not a likely 
destination for those interested in mitigating 
risk,” he says. “After all, the life-science goals 
of the Googles, Apples and Microsofts of the 
world are likely to change in the near term 
as the companies explore an area that is new 
to them? = 
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Neutrino probe 
is key priority 
for US physics 


Nuclear-science wish list 
also includes particle collider. 


BY DAVIDE CASTELVECCHI 


r 1wo large science experiments head a 
wish list drawn up by US nuclear phys- 
icists for the next decade: a quest to 

uncover the nature of neutrinos and a particle 

collider to study the forces that bind quarks. 

The big-ticket items, each of which would 
cost hundreds of millions of dollars, are among 
the top priorities highlighted by the Nuclear 
Science Advisory Committee (NSAC) on 
15 October. Every 5-7 years, this panel of 
high-level nuclear physicists presents a long- 
term plan to the US Department of Energy and 
National Science Foundation, after consulting 
the US nuclear-physics community. 

The agenda assumes that US funding for 
nuclear science will increase by 1.6% per year 
above inflation — a realistic scenario, says 
NSAC chair Donald Geesaman, a physicist at 
Argonne National Laboratory in Illinois. “We 
have exciting science to do, and we are not ask- 
ing for large increases,’ he says. 

The neutrino experiment, construction of 
which could begin by the end of the decade, 
would search for a theorized rare form of radio- 
active decay in which two identical neutrinos 
annihilate one another — an event that would 
imply that neutrinos are their own anti-particles. 
It could provide a way to measure the tiny mass 
of neutrinos and help to explain why the Uni- 
verse has lots of matter but almost no antimatter. 

Experiments around the world using materi- 
als such as liquid xenon have failed to detect the 
event, knownas neutrinoless double f decay. But 
the NSAC report says that an experiment using 
a tonne or more of material — about ten times 
more than any previous attempt — could either 
find or rule out the phenomenon. 

Another priority, on which Nature reported 
in May (see Nature 521, 272; 2015), is a parti- 
cle accelerator that would collide electrons with 
protons or heavy ions to investigate gluons, 
which carry the force that binds quarks. But 
construction would have to wait until the 
2020s because NSAC’s top priority is to com- 
plete and maintain existing facilities, such as 
the Relativistic Heavy Ion Collider (RHIC) at 
Brookhaven National Laboratory in Upton, 
New York. RHIC faced closure two years ago, 
but an improved budgetary position means it 
can now be sustained into the next decade. = 
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Russian President Vladimir Putin has declared certain types of civilian research classified. 


Secret service to 
vet manuscripts 


Moscow biology department requires research papers to be 
approved to comply with law on state secrets. 


BY QUIRIN SCHIERMEIER 


biology institute at Russia’s largest 
A= most prestigious university has 

instructed its scientists to get all 
research manuscripts approved by the security 
service before submitting them to conferences 
or journals. 

The instructions, which come in response 
to an amended law on state secrets, appear in 
minutes from a meeting held on 5 October 
at the A. N. Belozersky Institute of Physico- 
Chemical Biology at Lomonosov Moscow 
State University (MSU). 

The Russian government says that the 
amendment is not designed to restrict the 
publication of basic, non-military research. 
But scientists say that they believe institutes 
across the country are issuing similar orders. 

“This is a return to Soviet times when in 
order to send a paper to an international jour- 
nal, we had to get a permission specifying 
that the result is not new and important and 
hence may be published abroad,” says Mikhail 
Gelfand, a bioinformatician at MSU. 

In 1993, the government passed a law oblig- 
ing scientists in Russia to get permission from 
the Federal Security Service (FSB) before 
publishing results that might have military or 


industrial significance. This mainly covered 
work that related to building weapons, includ- 

ing nuclear, biological and chemical ones. 
However, in May, President Vladimir Putin 
used a decree to expand the scope of the law to 
include any science that can be used to develop 
vaguely defined “new products”. The amend- 
ment was part of a broader crackdown that 
included declaring the deaths and wounding 
of soldiers during peacetime a secret; this was 
prompted by accusations that Russian soldiers 
are involved in con- 


“Anything new flict in Ukraine. 

and potentially Since then, rumours 
usefulcannow be have emerged that 
interpreted to be Russian universi- 


ties and institutes 
are demanding that 
manuscripts be approved before submission to 
comply with the amendment. The minutes from 
the Belozersky Institute meeting confirm this. 
“Be reminded that current legislation obliges 
scientists to get approval prior to publication of 
any article and conference talk or poster,’ they 
say. They note that the rules apply to any pub- 
lication or conference, foreign or national, and 
to all staff “without exception” 

Scientists will need to seek permission from 
the university’s First Department — a branch of 


astate secret.” 


486 | NATURE | VOL 526 | 22 OCTOBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


the FSB that exists at all Russian universities and 
research institutes, says Viacheslav Shuper, a 
geographer at the Russian Academy of Sciences 
in Moscow and MSU. He says that MSU geog- 
raphers have been given similar instructions. 

The minutes tell scientists to seek permis- 
sion “despite the obvious absurdity of the 
whole situation”. Vladimir Skulachev, director 
of the Belozersky Institute, did not respond to 
Nature’s queries as to how the changes might 
affect research in his department. 

Shuper and other academics say researchers 
across Russia have complained that their insti- 
tutes are also asking for manuscript approval. 
“Many scientists in Russia don't dare to speak 
openly,’ says Shuper. “But I know that many are 
very unhappy about the degradation of their 
academic freedom” 

Letting bureaucrats decide whether any 
piece of science is a state secret is not just 
nerve-wracking, but also burdensome, he says. 
For example, at some institutes, scientists who 
have written papers in English for foreign pub- 
lication are obliged to translate them into Rus- 
sian for the sake of the security service. 

The changes are also bad for science, says 
Fyodor Kondrashoy, a Russian biologist at the 
Centre for Genomic Regulation in Barcelona, 
Spain. “The problem is that it appears that all 
scientific output is being treated as potentially 
classified,” he says. “This creates an unhealthy 
research climate with some scientists prefer- 
ring not to share information — not to give a 
talk at a conference abroad, for example. I fear 
that the authorities will choose to apply this 
law selectively against their critics.” 

Sergey Salikhov, director of the Russian sci- 
ence ministry's science and technology depart- 
ment, told Nature that the government does 
not intend the amendment to restrict the pub- 
lication of basic research. He says that it is not 
ordering universities or security services to pro- 
actively enforce the law over civilian research. 

But the amendment leaves interpretation to 
the security services and science administra- 
tors, who tend to be over-zealous, says Gelfand. 
“Basically, anything new and potentially use- 
ful can now be interpreted to be a state secret,” 
says Konstantin Severinov, a molecular biolo- 
gist with the Skolkovo Institute of Science and 
Technology, who graduated from MSU. 

The demand for approval runs counter to 
government efforts to strengthen and inter- 
nationalize Russian science, says Severinov. 
The government aims to see 5 of the country’s 
universities enter the top 100 in the world 
rankings by 2020, and is keen to attract lead- 
ing foreign scientists to Russia. 

Gelfand says that he will not comply with the 
rules imposed by his institute, and he encour- 
ages others to follow suit. “A sad sign of overall 
deterioration here is that many are sheep- 
ishly following any absurdity instilled by the 
bureaucrats,” he says. “Iam going to ignore it 
and hope that a sufficient number of colleagues 
would do the same.” m SEE EDITORIAL P.475 
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Gene therapy sees early success 
against progressive blindness 


Treatments for inherited eye diseases show promise in clinical trials, but worries linger over 
how long the beneficial effects will last. 


BY HEIDI LEDFORD 


phthalmologist Eric Pierce is no stranger 
() to difficult conversations. During his 

years at the Boston hospital Mass- 
achusetts Eye and Ear, he has counselled 
both adults and the parents of young children 
who have been newly diagnosed with genetic 
retinal diseases that will ultimately leave 
them blind. 

But progress against one such disease has led 
Pierce to change how he presents his diagnosis. 
“We're on the threshold ofa new era,’ he now 
tells anxious parents. “I do believe there will 
be a therapy for your child so that they won't 
experience the full course of this disease.” 

Pierce's optimism is grounded in early data 
from tests of gene therapy in animals and 
humans. On 12 October, researchers reported 
success with using gene therapy in dogs against 
a form of retinitis pigmentosa’, a genetic dis- 
ease that causes light-sensitive photoreceptor 
cells to degenerate over the course of years. The 
results unexpectedly showed that the approach 
worked well even in mature dogs that had 
already lost some photoreceptor cells, a sign 
that the strategy might also work in humans, 
who have often reached that stage well before 
diagnosis. 

And on 10 October at the Retina Society 
annual scientific meeting in Paris, a biotech- 
nology company presented encouraging 
data from a trial in humans. The company 
found that its gene therapy for a degenera- 
tive eye disease caused by a mutation in the 
RPE65 gene improved sensitivity to light in all 
21 treated patients. Although other research 
groups using the same approach have seen 
some reversal of similar gains, the company, 
Spark Therapeutics of Philadelphia, Penn- 
sylvania, plans to apply to the US Food and 
Drug Administration for regulatory approval of 
its therapy in 2016. If the treatment is approved, 
the company could be the first to bring a 
gene therapy to market in the United States. 

“This is definitely a time of great promise,” 
says Stephen Rose, chief research officer at the 
Foundation Fighting Blindness in Columbia, 
Maryland. “It moves the whole field of gene 
therapy forward.” 

Gene therapy has endured a bumpy road. 
After years of promising advances, the field 


almost came to a screeching halt in 1999. A 
death in a trial of gene therapy to treat an inher- 
ited metabolic disorder caused a scandal and 
sowed fears about the technology's safety. But 
an ardent few continued in the face of wide- 
spread scepticism and limited funding. The 
most notable success was a treatment for the 
genetic immune deficiency disease X-SCID, 
although it caused leukaemia in some patients. 

It was during this time that some gene- 
therapy researchers began to see a glimmer 
of promise in treating eye disorders. The eye 
is partially shielded from the immune system, 
reducing the likelihood of an immune attack 
on the virus used to introduce the genes. (Such 
an immune response was blamed for the 1999 
death.) The eye is also relatively easy to access, 
allowing surgeons to inject the virus near to the 
cells in which the gene is needed. Because more 
than 200 genes are associated with retinal dis- 
orders, the opportunity for genetic correction 
was clear. 

Researchers began with mutations in RPE65 


that are associated with one type of vision 
loss. The enzyme encoded by RPE65 is cru- 
cial for converting light into electrical signals 
that travel to the brain, and for sustaining the 
eye’s photoreceptors. Without a functioning 
enzyme, the photoreceptors gradually degrade, 
progressively crippling vision. The researchers 
hoped to halt this process by using a virus to 
shuttle a functional RPE65 gene into the eye. 
In 2007, three teams launched the first clini- 
cal trials aiming to do just that, and included 
a team that would go on to found Spark in 
2013. Positive results” * published in 2008 
rejuvenated interest in gene therapy, says Luk 
Vandenberghe, a virologist who studies gene 
therapy at Harvard Medical School in Boston. 
“They truly validated the concept of gene ther- 
apy that people had been pursuing for decades,” 
he says. “The field has really turned around.” 
But earlier this year, two of those three teams 
announced setbacks. They reported”® that the 
effects were waning in some patients as early as 
one year after treatment. > 


VISION FOR THE FUTURE 


Broader reach for gene therapy 


Although gene therapy is showing promise 
against vision loss caused by mutations in 
the gene RPE65, such mutations account for 
less than 2% of the total burden of inherited 
diseases that cause retinal degeneration. 
Hundreds of genes have been implicated in 
such disorders; tackling them individually 
would require laborious research, clinical 
testing and regulatory approval for each 
one. Instead, researchers are seeking ways 
of using gene therapy to treat a larger set of 
patients. 

Some are looking for ways to protect 
neurons in the eye from degeneration, 
regardless of which gene is involved in the 
process. Such ‘neuroprotective’ approaches 
under consideration include inducing cells 
to express a protein called RdCVF, which 
protects the cone cells in the eye that enable 
colour vision. 

At Harvard Medical School in Boston, 
Massachusetts, Connie Cepko’s laboratory is 


testing the effects of inducing the expression 
of a gene called NRF2, which activates 
antioxidant responses, to see whether those 
defences could protect photoreceptors from 
damage. 

Others are taking a second look at GDNF, 
a neuroprotective protein that has been 
explored for its possible use as a treatment 
for Parkinson’s disease and that may also 
protect photoreceptors in the eye. 

Researchers at GenSight Biologics in 
Paris and at RetroSense Therapeutics in 
Ann Arbor, Michigan, are taking a different 
tack. They hope to replace damaged 
photoreceptors by inducing the eye’s 
retinal cells to express light-sensitive 
proteins called channelrhodopsins. But 
channelrhodopsins are less sensitive to light 
than the eye’s natural photoreceptors, so 
GenSight is also developing special goggles 
that patients would wear to amplify the light 
signals reaching the eye. H.L. 
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> At Spark, chief scientific officer Kathy 
High says that her team has yet to see any 
decline in its patients even eight years after treat- 
ment. She notes that subtle differences in the 
protocol might have given Spark's treatment an 
edge. The virus that Spark engineered may have 
expressed RPE65 at particularly high levels, she 
notes, and the company also adds a surfactant 
molecule when injecting the virus to prevent 
it from sticking to the needle during injection. 
But vision scientist Artur Cideciyan of the 
University of Pennsylvania in Philadelphia, 
who works with one of the teams that reported 
a decline in gains after gene therapy, is still not 


convinced that Spark’s results will endure. He 
says that Spark has not yet announced data as 
detailed as those that the other teams used to 
measure the growth — and then decline — in 
their patients’ visual fields. 

Even so, the diminishing effect in human 
trials need not indicate a fundamental flaw in 
the approach — or in gene therapy as a whole, 
says Vandenberghe (see ‘Broader reach for 
gene therapy’). “All the tweaks haven't been 
fully worked out,’ he says. 

Pierce, as a clinician, considers even tentative 
progress a huge achievement. He recalls the 
time when the only support that he could offer 


some of his patients was to recommend dietary 
supplements that might slow the disease. “Years 
of efficacy in a chronic degenerative disease is 
a huge success,” he says. “And to have some 
optimism in the conversation is fantastic” = 
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Cuba forges links with 
United States to save sharks 


Improved diplomatic relations feed a budding environmental partnership. 


BY JEFF TOLLEFSON 


( "me is surrounded by sharks. Fisher- 
men catch them, residents eat them 
and, increasingly, tourists are coming to 

see them. Now the island nation is gearing up 

to manage them, and its efforts are bolstering 

a nascent environmental partnership with the 

United States. 

“It's a big step forward for Cuba and the 
region, says Jorge Angulo- Valdés, head of the 
Marine Conservation Group at the University 
of Havana’s Center for Marine Research and a 
visiting professor at the University of Florida in 
Gainesville. “It's time for us to get together, iden- 
tify common goals in resource management and 
make them work? 

On 21 October, Cuba plans to release a man- 
agement plan that will lay the groundwork for 
research and, eventually, regulations to protect 
extensive but largely undocumented shark and 
ray populations. Roughly half of the 100 spe- 
cies of shark resident in the Caribbean Sea 
and Gulf of Mexico have been seen in Cuban 
waters, including some — such as the whitetip 
(Carcharhinus longimanus) and longfin mako 
(Isurus paucus) — that have experienced sharp 
declines elsewhere. The Cuban government has 
consulted with environmentalists and academ- 
ics from the United States and other countries in 
developing the plan. 

“Cuba is a kind of biodiversity epicentre for 
sharks,” says Robert Hueter, director of the 
Center for Shark Research at the Mote Marine 
Laboratory and Aquarium in Sarasota, Florida, 
who is one of those working with the Cuban 


The Caribbean reef shark (Carcharhinus perezii) is one of many species that can be seen in Cuban waters. 


scientists. “The science is not ata level yet to do 
rigorous stock estimates, but we are moving in 
that direction with this plan” 

Most of what is known about Cuba’ shark 
populations has come from the fishing industry, 
which often captures sharks as by-products of 
its regular operations. The Cuban government 
has already established marine protected areas 
along 20% of its coastline and is planning to 
expand that network within the 70,000 square 
kilometres of its coastal fishery. It has also begun 
to regulate the equipment used in fishing, and is 
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looking to establish catch limits for various fish 
species, including sharks. 

Both US and Cuban scientists say that the col- 
laboration is helping to pave the way for more 
formal cooperation now that the two cold-war 
foes have re-established political relations. In 
April, the US National Oceanic and Atmos- 
pheric Administration (NOAA) sent a research 
vessel ona cruise around the island with Cuban 
scientists. And on 5 October, US secretary of 
state John Kerry and Cuban officials announced 
at an oceans conference in Chile that the two 


PETE OXFORD/MINDEN PICTURES/GETTY 


nations were finalizing plans to cooperate on 
research, education and management in marine 
protected areas. The agreement could be final- 
ized as early as next month, says Billy Causey, 
regional director for NOAAs Office of National 
Marine Sanctuaries in Key West, Florida. 


POLITICAL IMPETUS 

US environmentalists began pushing the idea 
of cooperation with Cuba on marine conserva- 
tion after the 2008 election of President Barack 
Obama, who pledged during the campaign to 
engage with Cuba. The first signs of real pro- 
gress came in September 2009, says Daniel 
Whittle, who heads the Cuba programme for 
the Environmental Defense Fund (EDF), an 
environmental group based in New York City. 
Then, the United States allowed four Cuban 
scientists, three of whom were marine and 
coastal researchers, to attend a series of meet- 
ings in the country. And in November last year, 
Angulo-Valdés was part of a cadre of Cuban 
scientists that visited the state department and 
several members of Congress. A month later, 
Obama ordered the restoration of diplomatic 
ties with Cuba. 

“It’s slowly beginning to change,’ says Whit- 
tle, referring to links between the nations. 
“That’s why the announcement in Chile was 
so significant: finally the two governments 
publicly acknowledged that they are in fact 


working directly together on environmental 
issues.” 

The EDF and other conservation groups 
have been trying to build cooperation between 
Cuba, Mexico and the United States within the 
Gulf of Mexico. NOAAs April cruise, which 
focused on tallying the larvae of bluefin tuna 
(Thunnus thynnus) in Cuban and Mexican 
waters, marked the first formal government 
engagement on that front since Obama's 

December announce- 


“Finally thetwo ment, Causey says. 

governments The main ques- 
publicly tion facing the shark- 
acknowledged management plan is 
that they are in whether the Cuban 
fact working government will 
directly be able to mobilize 
together on enough money to 
environmental implement it. The 
issues.” EDF and other groups 


have been raising 
funds to pay for some of the initial work on the 
plan, including training fishing crews to identify 
and report the sharks that they catch. But scien- 
tists need to conduct population surveys that 
are independent of those done by commercial 
fisheries, and Cuban research institutions are 
already stretched thin. 
The country has only two operational 
research vessels, and scarce resources to equip 
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and operate them. The kind of tags needed to 
track shark movements through satellites can 
cost US$2,500 each. So far, Cuba has tagged just 
four sharks with such devices. 

“We have to see how the government imple- 
ments the plan, and how they get around the 
funding problem,” Angulo- Valdés says. “It’s 
going to be a challenge.” m 


CORRECTION 

The News Feature ‘The impenetrable proof’ 
(Nature 526, 178-181; 2015) incorrectly 
stated that Shinichi Mochizuki estimated 
that it would take an expert 500 hours to 
understand his proof. In fact, this was Ivan 
Fesenko’s estimate. The story also stated 
that Fesenko warned Mochizuki against 
speaking to the press, but this was not part 
of their discussion. 

The News Feature ‘Brain, meet gut’ 
(Nature 526, 312-314; 2015) incorrectly 
stated that the US Office of Naval Research 
agreed to commit US$52 million into gut- 
brain research. In fact, the figure is closer to 
$14.5 million over the next 6-7 years. 

The Editorial ‘The worm returns’ (Nature 
526, 294; 2015) gave the wrong date for 
the landmark ‘The mind of the worm’ paper. 
The paper was published in 1986, not 1984. 


© 2015 Macmillan Publishers Limited. All rights reserved 


As a massive El Nifio warming builds in the 
equatorial Pacific Ocean, researchers hope 
to make the most of their chance to study 
this havoc-wreaking phenomenon. 


By Quirin Schiermeier 


he tropical Pacific seemed out of 
sorts this August, as oceanogra- 
pher Kelvin Richards and his team 
cruised along the equator east of 
the Marshall Islands. Six tropical 
cyclones had barrelled across the ocean in the 
previous month, and more were spinning up 
as Richards’ research expedition got under 
way. The sea surface across the region was 
abnormally warm, with water temperatures 
at least 1 °C higher than expected. And when 
the oceanographers peered below the surface, 
they found signs of intense turbulence extend- 
ing hundreds of metres down. 

The team had found itself cruising through 
a spectacular El Nifio warming event — one 
that may become the strongest ever recorded. 
Big El Ninos can turn climate conditions in 
the Pacific upside down and disrupt weather 
around the globe. The impacts of this one 
have already been felt. Indonesia has suffered 
through a withering drought that has intensified 
fires in forests and agricultural land, and Pacific 
corals are experiencing one of the worst bleach- 
ing events on record. Peru has declared a state 
of emergency in some regions in expectation of 


flooding, and farmers in Australia have been put 
on alert for expected drought. 

The last time a major El Nifio developed, in 
1997-98, extreme weather and flooding killed 
thousands and left a quarter of a billion people 
in Asia homeless. It also helped to jack up global 
temperatures to a point never recorded before. 

For Richards, a researcher at the University 
of Hawaii at Manoa, the arrival of the latest El 
Nifio turned out to be good timing. Such warm- 
ings develop only once or twice a decade, fol- 
lowing no regular schedule, so researchers are 
eager to learn how to predict when an El Nifo 
will hit and how powerful it will grow. That 
means that they must keep close tabs on the 
atmosphere and ocean, from the surface waters 
to the cooler layers hundreds of metres below. 
But getting the necessary data can be difficult. 
It takes years to plan research cruises, so it is 
hard to get a ship into the heart of the Pacific 
in time to study an unpredictable event. When 
Richards applied for ship time back in 2012, he 
had no idea that his trip would happen right as 
a warming episode was gathering strength. “It 
just happened to coincide with our expedition, 
and we gladly took the opportunity,’ he says. 
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Most oceanographers are not lucky enough 
to be out at sea this year, but they are taking 
advantage of their colleagues’ data, as well as 
information flowing in from research buoys and 
other sources. One key question that they want 
to answer is why every El Nino behaves differ- 
ently. “El Nifios are not made from a cookie cut- 
ter,’ says Michael McPhaden, an oceanographer 
with the US National Oceanic and Atmospheric 
Administration (NOAA) in Seattle, Washing- 
ton. The strength and impact of each El Nifio 
seem to depend in part on which region of the 
Pacific warms up first, but predicting the pat- 
tern of temperature anomalies is tough. “We 
would really like to better understand what's 
causing the diversity, and how far in advance it 
might be possible to predict what type of event 
we need to prepare for,’ says McPhaden. That 
would help forecasters to give warning of com- 
ing droughts and floods months before they hit. 


BAIT AND SWITCH 

The current El Nijfio is a glaring reminder of 
how much scientists need to learn. When it 
first started to take shape in 2014, it developed 
like many others. There was a weakening in 
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The RV Falkor has 
collected data that 
will aid studies of 
EI Nijfio. 


the easterly trade winds 
that normally flow from 
South America towards 
Asia, carrying heat and 
moisture to the western 
part of the basin. This allowed warmth to 
spread eastwards, and researchers expected 
the pattern to be reinforced by westerly 
wind bursts that would help to push warm 
water eastwards (see ‘Unruly ocean’). When 
enough warm water accumulates off the 
coast of South America, it prevents the nor- 
mal upwelling of cool, nutrient-rich water 
from deeper layers. That, in turn, alters fish 


_ populations and typically ruins the anchovy 


harvest off the coast of Peru. 
But in 2014, the warming along the equa- 


~ tor was less pronounced than in most El 
» Nifio years, and the westerly wind bursts 


did not appear as expected. By mid-year, 
the anticipated El Nifo had completely 
vanished. 

What had stopped the show, and why the 
Pacific warming spectacularly resurfaced 
12 months later, are questions that are 
puzzling ocean researchers and meteorolo- 
gists. The mysteriously reborn El Nifo is a 
fantastic opportunity for researchers to com- 
bine observations and models to find out what 
has happened, and perhaps to improve fore- 
casting systems, says Axel Timmermann, an 
oceanographer at the University of Hawaii. 

One possible explanation, he says, is that the 
expected westerly wind bursts came too early 
last year, so they did not pile up enough warm 


, water in the eastern Pacific to inhibit upwelling. 


That would have stopped El Nino in its tracks. 
But there is also a chance that an overlooked 
mechanism enabled cool water from deep lay- 
ers to reach the surface. Or it might simply be 
that the erratic nature of the El Nifio Southern 
Oscillation (ENSO) — the irregular sequence 
of warm El Nifo and cold ‘La Nifia phases — is 
down to the randomness of the weather. 

To test these hypotheses, researchers will 
need many forms of data, including measure- 
ments of ocean temperature over time, 
upwelling rates, water density and the strength 
of currents. And it will be important to com- 
pare El Nifo years to neutral years and times 
when La Nifa appears, as well as years when 
an event seems to be looming and then fails to 
materialize, says Matthew England, a climate 
scientist at the University of New South Wales 
in Sydney, Australia. 

The problem is made even more difficult 
because ENSO behaviour may be shifting as 
a result of climate change. Warmer surface 
waters make it easier for an El Nifio to start, 
so researchers expect the events to become 
more frequent. Last year, a model-based study 
by Wenju Cai, a physical oceanographer with 
the Commonwealth Scientific and Industrial 
Research Organisation in Aspendale, Aus- 
tralia, in which Timmermann was involved, 
suggests that by the end of the century, 


extreme E] Nifios such as the 1997-98 event 
will occur twice as often as they have in recent 
decades (W. Cai et al. Nature Clim. Change 4, 
111-116; 2014). 


PROBLEM CHILD 

The Pacific warming was first described in 
the late 1880s by a Peruvian Navy captain who 
reported on an unusually warm ‘corriente del 
Nifio’ (ocean current of the Christ Child), so 
named because it appeared around Christmas 
time. For a long time, El Nifio was thought to 
be alocal phenomenon off Peru and Ecuador. 
But measurement campaigns during the Inter- 
national Geophysical Year 1957-58, which 
coincided with a major El Nino, revealed that 
the phenomenon spans the whole Pacific 
Ocean. Over the decades since, research on El 
Nifio and La Nifa has shown how conditions 
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community by collecting data on factors 
including water temperature, salinity and den- 
sity as it made its servicing runs to the buoys. 
With the ship no longer available, data collected 
from buoys and autonomous floats are not suf- 
ficient to study the subtle changes in currents 
and ocean mixing that may be involved in 
El Nifio evolution, says Timmermann. 
Researchers have other vessels available, such 
as the RV Falkor, on which Richards made his 
trip. The RV Kilo Moana hosted a second Uni- 
versity of Hawaii team in the equatorial Pacific 
in August and September. Oceanographers 
Brian Popp and Jeffrey Drazen had an unex- 
pected research opportunity: they had planned 
to study mercury accumulation in marine 
organisms in a region with strong equatorial 
upwelling, but the data they collected during 
their expedition will allow them to examine 


Unruly ocean 


EI Nifio warmings develop when equatorial trade winds weaken, letting warm 
water and thunderstorm activity spread from the west Pacific Ocean to the east. 
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in the ocean and atmosphere reinforce each 
other to produce warming and cooling. 
Although El Nifio/La Nifa events can cause 
powerful changes to weather, science-funding 
agencies have been reluctant to sponsor expen- 
sive research expeditions to study them because 
they are so hard to forecast. Researchers instead 
rely toa large extent on data from the Tropical 
Atmosphere Ocean network of buoys strung 
across the Pacific, which is jointly operated 
by NOAA and the Japan Agency for Marine- 
Earth Science and Technology (JAMSTEC). 
Temperature and salinity data from the array 
of 70 or so moored buoys allow researchers to 
detect unusual ocean warming and track the 
large waves that push warm water eastwards. 
But the array is not without problems. Many 
buoys have failed in recent years, temporarily 
leaving scientists with data from just 40% of the 
network. Thanks to repair work, the system is 
currently back at 80% capacity. But budget 
cuts in 2012 forced NOAA to decommission 
a ship, the RV Ka’imimoana, which was used 
for regular maintenance of the array. Over its 
16-year service with NOAA, the ship had also 
made itself invaluable to the El Nifio research 
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how the effects of a strong El Nifio might ripple 
through the marine food web. And yet, says 
Cai, this year’s El Nifio — possibly a once-in- 
a-generation event — is a missed opportunity 
with respect to going out and documenting 
the breadth of physical, chemical and biologi- 
cal changes that might occur in the ocean. “It’s 
a pity we can't have more ships at sea,” he says. 
Help might be on the way. By 2020, NOAA 
and JAMSTEC hope to have launched a sus- 
tained Tropical Pacific Observing System of 
buoys and satellite instruments to advance 
understanding of ocean variability and 
improve weather and climate prediction. 
That will be too late to help with the current 
El Nino, which is expected to peak late this 
year or early next. In recent months, it has been 
keeping pace with the most powerful El Nifos 
on record, and westerly wind outbreaks in early 
October promised to keep the warming going. 
Asa result, forecasters are warning many parts 
of the globe to prepare for some wild and crazy 
weather over the next several months. = 


Quirin Schiermeier reports for Nature from 
Munich, Germany. 
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THE 


TIME LAB 


WHY DOES MODERN LIFE 
SEEM SO BUSY? 


AN OXFORD CENTRE IS TRYING TO FIND ANSWERS 
WITH THE WORLD'S BIGGEST COLLECTION OF 
TIME-USE DIARIES. 


BY HELEN PEARSON 


n 1961, when more and more people were buying television sets to 

go with their radios, the BBC wanted to work out the best times to 

air its programmes. So its audience-research department decided 

to aska sample of people across the United Kingdom to record what 
they were doing every half hour of the day, and to indicate whether the 
TV or radio was on. 

The result was a trove of 2,363 diaries filled with the everyday details 
of British lives. “8 a.m., Eating breakfast,” read one; “8.30 a.m., Taking 
children to school; 9 a.m., Cleaning away, washing up and listening to 
Housewives’ Choice” — a popular radio record-request programme 
of the day. 

Today, these files are part of the biggest collection of time-use diaries 
in the world, kept by the Centre for Time Use Research at the University 
of Oxford, UK. The centre's holdings have been gathered from nearly 
30 countries, span more than 50 years and cover some 850,000 person- 
days in total. They offer the most detailed portrait ever created of when 
people work, sleep, play and socialize — and of how those patterns have 
changed over time. “It certainly is unique,’ says Ignace Glorieux, a soci- 
ologist at the Dutch-speaking Free University of Brussels. “It started 
quite modest, and now it’s a huge archive,’ 

The collection is helping to solve a slew of scientific and societal 
puzzles — not least, a paradox about modern life. There is a widespread 
perception in Western countries that life today is much busier than it once 
was, thanks to the unending demands of work, family, chores, smart- 
phones and e-mails. But the diaries tell a different story: “We do not get 
indicators at all that people are more frantic,’ says John Robinson, a soci- 
ologist who works with time-use diaries at the University of Maryland, 
College Park. In fact, when paid and unpaid work are totted up, the aver- 
age number of hours worked every week has not changed much since the 
1980s in most countries of the developed world. 

Epidemiologists, meanwhile, are mining the diaries to explain how 
lifestyle changes are contributing to a rise in many chronic diseases. The 
diaries “were the greatest asset I could possibly have’, says physiologist 
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Edward Archer at the University of Alabama at Birmingham, who used 
the data in a 2013 study’ of obesity. 

Now, the Oxford centre is testing a major update to its 50-year-old 
methods. In addition to asking people to complete a handwritten diary, 
it last year began giving them an electronic fitness tracker anda small 
camera that snaps a stream of pictures of their day (see “The gadget 
guinea pig’). “The idea is to be a bit more adventurous,” says Teresa 
Harms, a sociology research fellow who is leading the project. “Are new 
technologies better than what we've been doing all these years?” 


TIME MANAGEMENT 

Ironically for a scientific institute dedicated to time use, the researchers 
at the Oxford centre are no better at time management than anyone 
else. If anything, they are worse. One day in July, students were playing 
a game of croquet on the lawn outside the centre’s home: a stone build- 
ing in the placid grounds of St Hugh’s College. But inside, things were 
more fraught. One flustered postdoc had slept through her alarm and 
arrived at 10.33 a.m. — more than an hour late for her meeting. The cen- 
tre’s ebullient founder and co-director, sociologist Jonathan Gershuny, 
cheerily admitted that his own time management is “terrible” — shortly 
before locking himself out of his office without his keys, ensuring that 
he would arrive for his next appointment catastrophically late. 

None of this seems to have slowed down Gershuny, who can trace 
the origins of the centre to the 1970s, when he was starting his career at 
the University of Sussex in Brighton, UK. Gershuny wanted to predict 
what society and the economy would look like in future decades — but 
he realized that there was very little empirical evidence showing how 
people actually spend their time. 

Gershuny started to search for surveys in which people had been 
asked to record their daily activities. Among his first discoveries were 
the BBC diaries. Another thousand or so journals, recorded in the 
1930s, turned up in a mouldering old tea chest at the university. 

As Gershuny’s diary collection grew, it became obvious that the 
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FEATURE NEWS: 


Analysis of diaries reveals how the average use of time has changed on a standard weekday between 1961 and 
2015 in the United Kingdom (similar patterns are seen in other developed countries). In broad terms, the data 
show a slight growth in leisure time for men and women, and that patterns of paid work are changing for both. 
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Work away from home Travel or shopping BB Eating at home HB Other home leisure 


Hours spent in paid work have 
declined; hours spent doing 
unpaid work (domestic work 
and childcare) have increased, 
but do not match the time 
spent by women. 
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Women spend more time in the 
workplace, and have reduced 
the time spent on unpaid work. 


—— oO ———oo0ooo SS eS Ee eee! 


7 AM Midday 6 PM Midnight 7 AM Midday 6 PM Midnight 


Despite a perception that life has become busier, the number of A comparison of estimated work hours with actual hours (based on 
people in a US survey who report feeling ‘always rushed’ has fallen US time-use diaries, 2003-07) showed that people often 
in the past decade. overestimate how much they work. 
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records had been gathered by many different investigators around the 
world for many different purposes. To align the data and make meaning- 
ful comparisons, he would have to put them into a standardized form. 
So in the 1980s — by then working at the University of Bath, UK — he 
developed the Multinational Time Use Study: a system in which every 
activity is given one of 41 codes (gardening, 9; sleeping, 16; relaxing, 36). 
In an early project that assessed diaries from the United States and the 
United Kingdom, Gershuny and Robinson showed in 1988 that women 
in both nations were spending less time on domestic work, whereas men 
were doing slightly more — a consequence of women’s increasing entry 
into paid employment and of shifting societal norms’. 

By the early 2000s, many countries had started to collect standardized 
time-use data; the US Bureau of Labor Statistics started gathering them 
annually in 2003. These efforts were driven by a growing global interest in 
understanding the impacts of time use on economies and on well-being. 

But the diary bank still remained something of a side line for 
Gershuny until 2008, when he won funding to develop a centre at 
Oxford dedicated to time-use research. Then, in 2013, the centre was 
awarded two major grants: €2.5 million (US$2.8 million) from the 
European Research Council and £3.7 million (US$5.7 million) from 
the UK Economic and Social Research Council (ESRC), to exploit the 


The diary 
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Camera 
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I 
4 AM 


High 


On 16 July 2015, | wore an 
accelerometer that tracks movement 
and a camera that took three 
images per minute. | also recorded 
what | was doing — and how much | 
was enjoying it — in a written diary. 
The Oxford Centre for Time Use 
Research in the United Kingdom is 
collecting gadget diaries in this way 
to find out if they produce 
more-useful information than do 
conventional paper diaries. H.P. 


Enjoyed time: 
Very much 


diary bank and to launch a major collection of time-use diaries in the 
United Kingdom. 

“We were paid suddenly to do the things that Id been agitating to do,’ 
Gershuny says. And one of those things has been to find out why some 
people feel so busy all the time. 


NO TIME 

In 1930, the economist John Maynard Keynes wrote an essay predicting 
life 100 years ahead. The United States and Europe would be so pros- 
perous that people would work just 15 hours a week, he said, and the 
main concern for “our grandchildren” would be how to fill their copious 
leisure time. 

That’s not quite how things are turning out — something that 
Gershuny started to think about in the early 2000s. He was feeling des- 
perately busy — more so than in the past — and people around him 
were complaining that they were stressed out and working harder 
as well. Books on the matter were proliferating, with titles such as 
Fighting for Time’, Busier than Ever* and Work Without End*. Survey data 
hinted at the problem too: in the United States, the proportion of people 
reporting’ that they ‘always’ felt rushed was 24% in 1965 but 34% in 2004. 

Yet when researchers used diary data to look into the matter, a 
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different picture emerged. Analyses showed that people in many coun- 
tries routinely overestimate the amount of time that they spend work- 
ing — in the United States, by some 5-10% on average’ (see “The truth 
about time’). But those who work longer hours tend to overestimate by 
the most: people who guess that they work 75-hour weeks, for example, 
can be over by more than 50%, and those of certain professions — teach- 
ers, lawyers, police officers — overestimate by more than 20%. (Scien- 
tists were not the worst exaggerators: they estimate working close to 
42 hours per week on average, whereas diaries clock them at 39 hours)°*. 

In a 2005 study’, Gershuny compared the BBC diaries from 1961 
with UK diaries collected in 1983-84 and 2001, adding up the number 
of minutes per day spent on paid work, unpaid work (such as chores 
around the house) and other activities. He wanted to know whether 
people were actually working longer hours than they did 40 years ago. 

The answer was that it depends. Men had reduced the number of 
hours they spent on paid work, increased those in unpaid work and 
overall came out ahead, with just under 50 minutes more free time per 
day. Women were doing more paid work — again reflecting their move- 
ment into the workplace over the decades — and less unpaid work, 
producing little change overall. 

Studies in the United States and western European countries have 
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shown similar patterns: little overall change in work time and, at least in 
some studies and groups, a slight growth in leisure time. All in all, there 
is little support for the idea that everyone is working harder than ever 
before. “When you look at national averages of time-use data, it doesn't 
really show up,’ says Oriel Sullivan, a sociologist who now co-directs 
the centre with Gershuny. 

But certain groups have experienced a different trend. According 
to analyses by Gershuny, Sullivan and other time-use researchers, 
two demographic groups are, in fact, working harder. One consists of 
employed, single parents, who put in exceptionally long hours compared 
to the average; the other comprises well-educated professionals”, par- 
ticularly those who also have small children. People in this latter group 
find themselves pushed to work hard and under societal pressure to 
spend quality time with their kids. “The combination of those pressures 
has meant that there is this group for which time pressure is particularly 
pertinent,” Sullivan says. 

These findings, the researchers say, could help to explain why there 
is a widespread perception that life is busier for everyone. Sullivan 
and Gershuny propose that the time-squeezed professional group 
includes many of the academics who study and discuss the phenom- 
enon, as well as the journalists who write about it — in other words, 
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the people in society with a loud voice. 

But Gershuny suggests that changing atti- 
tudes to work and leisure may also play a part. 
In nineteenth-century Europe, having ample 
leisure time signified a person of high social 
status: one philosopher described the literary 
types in Paris around 1840, who had such an 
abundance of time that it was fashionable to 
walk a turtle on a leash through the arcades. 

In the twenty-first century, the situation 
has reversed, so that being busy isa signal ofa 
privileged social position — and therefore an 
impression that some people are keen to give. 
Gershuny calls being busy in the modern day 
a “badge of honour’, and Glorieux agrees. “The 
first thing we say when we meet people is 1m 
busy,” he says. “Suppose we say, ‘I’m not busy; 
Ihave nothing to do; I was watching some TV. 
It's not what people want to say.’ 

People might also feel busier because of an 
increase in multitasking, especially with com- 
puters and smartphones. The US time-use 
diaries are poor at recording how long people 
engage with their devices, says Robinson — in 
part, he suspects, because they have become so 
prevalent that people don't even report when 
they are on them. 


FAMILY TIME 

The diary bank at Oxford has revealed other 
changes in how people use their time. In 
2011, Oxford centre postdoc Evrim Altintas 
drew on diaries collected in the United States 
between 1965 and 2013 to examine how 
much time parents spend on ‘developmental 
childcare’ — engaging with children through 
reading, talking and helping with homework". 
Activities of this type are strongly associated 
with better educational scores, behaviour and 
other positive outcomes in later life. 

Altintas found that parents overall were 
spending more time on developmental child- 
care in the 2000s than they were in the 1960s 
and 1970s, but she found a bigger increase 
among parents who both had university 
degrees than for those who had high-school 
diplomas or less. She estimated that a child 
born to a highly educated mother in the early 
2000s would receive 27 minutes more devel- 
opmental childcare per day than one born to 
less-educated parents — adding up to 657 extra 
hours of focused attention for that child during 
the first four years of life. “That puts the chil- 
dren who are born to less-educated parents at 
a real disadvantage,” Altintas says. 

The diaries have also exposed trends that 
could affect the health of adults. In his study 
on obesity’, Archer analysed more than 50,000 
diary days collected between 1965 and 2010 and 
divided women’s time into paid work, house- 
hold work, personal care and free time. Then 
he calculated what that meant for the amount 
of energy they were burning up. The results 
showed that women in 2010 were spending 
around 12 hours less per week on cooking, 
cleaning, laundry and other domestic work 


than women in 1965, and that had shifted 
towards more-sedentary pursuits such as using 
a computer. As a result, the team estimated 
that working women today are burning some 
130 kilocalories per day less than those in the 
1960s, and they proposed that this could be one 
explanation for the rise of obesity in the United 
States. (Archer stresses that he is not saying that 
women should do more housework; rather, the 
work reinforces public-health advice encourag- 
ing more physical activity of any kind.) 


“THE FIRST 
THING WE SAY 
WHEN WE MEET 

PEOPLE IS 


‘VM 
BUSY.” 


At Oxford, Gershuny and Harms are 
attempting to carry out more-detailed analyses 
of energy use in collaboration with researchers 
at the US National Cancer Institute in Rock- 
ville, Maryland. Harms has been matching up 
entries in a selection of diaries to a list, widely 
used in research, of more than 800 activities 
alongside estimates of the energy burned 
doing each. (It includes entries as specific as 
playing darts, coalmining, whirlpool-sitting 
and casino gambling.) The study, which is still 
under way, has so far shown that a gym ses- 
sion or other structured work-out accounts 
for only a small fraction of the energy that a 
person typically burns each day. Activities such 
as paid work and childcare often burn more — 
because even if they are less physically intense, 
they take up longer periods of time. “The real 
metabolic activity is built up during the work- 
ing day,’ Harms says. 


NEXT-GENERATION DIARIES 
Since Gershuny started his diary bank, time- 
use research has become a thriving industry: 
there are now several hundred researchers 
in the field. But time-use diaries have weak- 
nesses, and the biggest is that they can be 
wrong: people quickly forget what they were 
doing and record their days inaccurately — or 
they can lie. The desire to improve accuracy is 
one of the motivations behind CAPTURE-24: 
Gershuny and Harms’ project to collect a new 
generation of diaries using some of the latest 
gadgets around. 

So far, about 150 people have each spent 
24 hours wearing a watch-like accelerometer 
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strapped to their wrists and a small camera, 
which takes around three pictures per minute, 
hung around their necks. They also jot down 
what they are doing for every 10-minute slot 
of the day in a conventional paper diary. The 
aim is to see whether the devices can provide 
more-useful information for the researchers 
than a standard paper diary alone. 

The accelerometer should collect more- 
accurate data on body movement and energy 
use, one reason that the project has earned 
support from biomedical-research funders the 
British Heart Foundation and the Wellcome 
Trust, both in London, as well as the ESRC. 
The photos could keep a more faithful record 
of when and what people eat — food diaries are 
notoriously unreliable — or reveal important 
nuances in people's interactions with children. 
(It is harder to say that you were focusing on 
childcare when the camera pictures show that 
you were checking your phone.) 

In her preliminary analyses, Harms has 
found that gadget diaries and paper diaries 
show the same sequence of events, but that the 
gadgets reveal details that paper diaries missed. 
Most researchers in the field agree that the 
future lies in collecting data through phones 
and other devices. “Maybe this will bring a new 
boost to time-use research,’ Glorieux says. He 
anticipates a situation in which reams of diary 
data — such as location, heart rate, calories 
burned and even ambient noise — are col- 
lected through phones and linked-up gadgets. 

The researchers at Oxford are keen to grow 
their diary collection in other ways. They 
recently added ones from China, South Korea 
and India, and they are trying to include more 
from Eastern European and developing coun- 
tries. And Gershuny holds out hope that there 
are more tea chests of old diaries still waiting to 
be found. Then, scientists can begin to exam- 
ine cultural differences in how people from 
different regions work, rest and play. 

So many questions, so much data, so little 
time. Clearly, doing all this is going to take a 
lot more than Keynes's 15 hours a week. But 
the scientists hope to get there, by taking it one 
day at a time. m 


Helen Pearson is Chief Features Editor for 
Nature. 
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China’s cities are among the world’s worst in terms of air quality. 


China’s choking cocktail 


Cleaning up city and indoor air will require a deeper understanding of the 
unprecedented chemical reactions between pollutants, says Markku Kulmala. 


irty air threatens the health of 
D billions of city dwellers around the 

world. China's megacities are among 
the worst, with concentrations of airborne 
pollutants 10-100 times higher than those 
in Europe or North America, and occasion- 
ally even 1,000 times higher. An estimated 
2.5 million people in China die each year 
from the health effects of indoor and out- 
door air pollution’’. 


Efforts to improve air quality are target- 
ing only the tip of the iceberg. Cities such 
as Beijing routinely measure levels of par- 
ticulate matter measuring 10 micrometres 
(PM,,) and 2.5 micrometres (PM, ;) in size, 
as well as a few gases such as sulfur dioxide 
(SO,), nitrogen oxides (NO,), carbon mon- 
oxide (CO) and ozone. But urban air is a 
complex cocktail of chemicals whose poorly 
understood interactions and feedbacks 
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may exacerbate health problems. Efforts 
to reduce one pollutant can have perverse 
effects on others as conditions change. 

The chemistry of China’s polluted urban 
air is unprecedented. Higher populations, 
heavier industries and modern goods 
manufacturing, as well as the climatic con- 
ditions, make Beijing’s smogs markedly dif- 
ferent from the ‘pea soupers that afflicted 
London and other European cities 
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> 50 to 100 years ago. Many atmospheric 
processes are nonlinear — meaning that the 
relationship between cause and effect is not 
proportional. So we do not know and can- 
not predict which harmful compounds are 
being formed. Whenever my colleagues and 
I make measurements with new instruments 
in China, we find unexpected results. Indoor 
air quality is equally affected. 

The nation’s urban air can begin to be 
cleaned only ifa comprehensive monitoring 
and modelling approach takes atmospheric 
chemistry into account. To guide decisions, 
we need to know which hazardous pollutants 
are present and how they interact to generate 
secondary pollution. 

Here I outline a road map for bringing 
China’s cities up to European air-quality 
standards within a decade. Actions towards 
cleaner air, as well as improving health (see 
‘Deep clear’), will reduce greenhouse-gas 
and black-carbon (soot) concentrations and 
enhance freshwater quality’ and food supply. 


TOXIC MIX 
China’ air pollution has worsened as emis- 
sions from industry, energy production and 
traffic have grown. China is responsible for 
30-35% of the global SO,, NO,, CO and par- 
ticulate emissions, and 40% of global particle 
numbers (PN) in the 20-1,000-nanometre 
size range (see go.nature.com/uw3jx6). The 
nation’s share of global greenhouse-gas emis- 
sions is 29% for carbon dioxide and almost 
20% for methane. 
Government efforts 


“As outdoor 
are under way to . 
reduce the emissions air gets 
of all these. cleaner, : 
But it is impossible indoor aur 
to reduce second- quality might 


»” 
ary pollutants such 9 ©Velt worsen. 


as ozone and organic 

aerosols without a deep understanding of 
the chains of chemical reactions and physi- 
cal processes that pollutants undergo. The 
formation and decay of secondary pollutants 
depend on temperature, humidity and wind 
speed as well as other chemicals and particles 
in the urban atmosphere. There is so much 
that we do not know. Results are hard to 
predict because many physical and chemi- 
cal processes — such as surface chemistry, 
oxidation, clustering and dynamical effects 
— happen simultaneously*. 

Attempts to control one pollutant can 
increase the concentration of others. Meas- 
urements in Nanjing, for example, show” 
that reducing NO, emission could cause a 
tenfold increase in summer ozone concen- 
trations. Reducing smog increases sunshine 
levels and temperatures, and alters rain and 
snowfall patterns®. 

Surprising reactions are going on above 
Chinese cities that do not occur in cleaner 
air. For example, small atmospheric 


Pollutant levels inside Chinese homes can be hundreds of times higher than those in European homes. 


molecular clusters (measuring 1-3 nanome- 
tres) are tens of times more concentrated in 
Shanghai, Nanjing and Beijing than in Euro- 
pean cities. Secondary aerosols (containing 
sulfates, nitrogen compounds and organics) 
form more readily in Shanghai and Nanjing 
than existing models predicts’. Unknown 
chemical pathways and physical processes 
must be occurring® that could create new 
types of oxidant or change the surface prop- 
erties of aerosols, limiting their ability to take 
up condensable vapours. 

Indoor air quality is also a problem. City 
dwellers spend more than 90% of their time 
indoors, particularly if outdoor air quality 
is poor. Cooking, smoking, heating and 
furnishings release PM, PN, CO, volatile 
organic compounds (VOCs) and NO,, 


DEEP CLEAN 


Deaths from air pollution will rise as cities and 
industries expand. Limiting single pollutants will 
buy time, but only a concerted effort will clean 
China’s urban air for good within a decade. 


= — No reductions 
—— Hasty actions 
=== Holistic solution 


Monitor all pollutants, 
understand their 
reactions and install 
clean technologies. 


Annual mortality due to pollution 


Years 
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adding to pollutants drawn in from outside. 
Little is known about the pollution levels 
in Chinese homes, but my team’s prelimi- 
nary findings indicate that concentrations 
of some pollutants are higher than those 
outdoors (especially for PN and VOCs) and 
100-1,000 times higher than inside Euro- 
pean homes. My colleagues and I estimate 
that indoor air pollution contributes almost 
as much as outdoor smog (which causes 
more than 1.3 million deaths per year) to 
the pollution-related death toll. 

Again, little is known about the sec- 
ondary production of chemicals indoors. 
Ventilation systems filter large particles 
but not gases such as VOCs, NO, and SO,, 
which can go on to form ultrafine sulfates, 
nitrates and organic particles. The risk of 
secondary aerosol production from these 
gases rises in filtered air because there are 
fewer grains on which to condense. As 
outdoor air gets cleaner, indoor air quality 
might even worsen. 


CLEARING THE AIR 
It is important to continue efforts to cut 
pollution. But meeting the Chinese central 
government's goal of improving urban air 
quality to levels typical of the United States 
and Europe requires more: simultane- 
ous tracking of all air pollutants relevant 
to health and their feedbacks and inter- 
actions — over at least a decade. We need to 
understand how the mixture and its toxicity 
changes as air quality measures are imple- 
mented. Short campaigns are not sufficient. 
Central and regional governments, 
research institutes and universities should 
collaborate to do the following, with help 
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from atmospheric chemists from around 
the world. 

First, establish and fund a network of 
‘flagship’ stations’ to monitor: concentra- 
tions, fluxes, interactions and feedbacks 
as well as more general air quality and 
meteorology data. Around 5-8 such 
stations (costing between US$7 million 
and $11 million each) would suffice 
for a major city. These should be com- 
plemented by mobile measurement 
platforms on cars and aeroplanes, remote 
sensing of air columns from the ground, 
satellite observations and smog cham- 
bers. Major sources of pollutants can be 
identified using historic data. 

Second, indoor air-quality meas- 
urements and monitoring must be 
conducted concurrently in a represent- 
ative selection of residential and office 
buildings. 

Third, atmospheric chemists must 
model secondary-pollutant production 
pathways and feedback mechanisms 
under high concentrations of various 
pollutants. These models must then be 
compared with observations. 

Fourth, the links between air pollutants 
and mortality and other health effects 
need to be established. That way the 
most health-relevant pollutants and their 
sources can be identified and mitigated 
first. A database should be developed to 
track health impacts. 

Fifth, long-term sustainable engineer- 
ing solutions such as improving processes 
and material flows in industry must be 
implemented to maintain low levels of 
air pollution. This will require capacity 
building across the Chinese authori- 
ties and institutes on using air-quality 
assessment data in decision-making, in 
developing legislative tools and in clean- 
air action plans. 

Only by understanding atmospheric 
chemistry will China clean its air. m 


Markku Kulmala is professor of aerosol 
physics at the University of Helsinki, 
Finland. 

e-mail: markku.kulmala@helsinki.fi 
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The mysterious Indus unicorn on a roughly 4,000-year-old sealstone, found at the Mohenjo-daro site. 


Cracking the 
Indus script 


Andrew Robinson reflects on the most tantalizing of all 
the undeciphered scripts — that used in the civilization 
of the Indus valley in the third millennium Bc. 


he Indus civilization flourished 
Te: half a millennium from about 

2600 Bc to 1900 Bc. Then it myste- 
riously declined and vanished from view. 
It remained invisible for almost 4,000 years 
until its ruins were discovered by accident 
in the 1920s by British and Indian archae- 
ologists. Following almost a century of exca- 
vation, it is today regarded as a civilization 
worthy of comparison with those of ancient 
Egypt and Mesopotamia, as the beginning of 
Indian civilization and possibly as the origin 
of Hinduism. 

More than a thousand Indus settlements 
covered at least 800,000 square kilometres 
of what is now Pakistan and northwest- 
ern India. It was the most extensive urban 
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culture of its period, with a population of 
perhaps 1 million and a vigorous maritime 
export trade to the Gulf and cities such as 
Ur in Mesopotamia, where objects inscribed 
with Indus signs have been discovered. 
Astonishingly, the culture has left no archae- 
ological evidence of armies or warfare. 
Most Indus settlements were villages; 
some were towns, and at least five were 
substantial cities (see “Where unicorns 
roamed’). The two largest, Mohenjo-daro 
—a World Heritage Site listed by the United 
Nations — located near the Indus river, and 
Harappa, by one of the tributaries, boasted 
street planning and house drainage worthy 
of the twentieth century ap. They hosted 
the world’s first known toilets, along with > 
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> complex stone weights, elaborately drilled 
gemstone necklaces and exquisitely carved 
seal stones featuring one of the world’s 
stubbornly undeciphered scripts. 


FOLLOW THE SCRIPT 

The Indus script is made up of partially 
pictographic signs and human and animal 
motifs including a puzzling ‘unicorn. These 
are inscribed on miniature steatite (soap- 
stone) seal stones, terracotta tablets and 
occasionally on metal. The designs are “little 
masterpieces of controlled realism, with a 
monumental strength in one sense out of 
all proportion to their size and in another 
entirely related to it”, wrote the best-known 
excavator of the Indus civilization, Mortimer 
Wheeler, in 1968’. 

Once seen, the seal stones are never forgot- 
ten. I became smitten in the late 1980s when 
tasked to research the Indus script by a leading 
documentary producer. He hoped to entice 
the world’s code-crackers with a substantial 
public prize. In the end, neither competition 
nor documentary got off the ground. But for 
me, important seeds were sown. 

More than 100 attempts at decipherment 
have been published by professional scholars 
and others since the 1920s. Now — asa result 
of increased collaboration between archae- 
ologists, linguists and experts in the digital 
humanities — it looks possible that the Indus 
script may yield some of its secrets. 

Since the discovery of the Rosetta Stone in 
Egypt in 1799, and the consequent decipher- 
ment of the Egyptian hieroglyphs beginning 
in the 1820s, epigraphers have learnt how 
to read an encouraging number of once- 
enigmatic ancient scripts. For example, the 
Brahmi script from India was ‘cracked’ in 
the 1830s; cuneiform scripts (character- 
ized by wedge-shaped impressions in clay) 
from Mesopotamia in the second half of the 
nineteenth century; the Linear B script from 
Greece in the 1950s; 


andthe Mayanglyphs “No firm 
from CentralAmerica information 
in the late twentieth is available 
century. about its 
Several important underlying 
scripts stillhaveschol- [qn guage.” 


ars scratching their 
heads: for example, Linear A, Etruscan from 
Italy, Rongorongo from Easter Island, the 
signs on the Phaistos Disc from the Greek 
island of Crete and, of course, the Indus script. 
In 1932, Flinders Petrie — the most cel- 
ebrated Egyptologist of his day — proposed 
an Indus decipherment on the basis of the 
supposed similarity of its pictographic 
principles to those of Egyptian hieroglyphs. 
In 1983, Indus excavator Walter Fairservis 
at the American Museum of Natural His- 
tory in New York City, claimed in Scientific 
Americar’ that he could read the signs in 
a form of ancient Dravidian: the language 


Mohenjo-daro existed at the same time as the civilizations of ancient Egypt, Mesopotamia and Crete. 


family from southern India that includes 
Tamil. In 1987, Assyriologist James Kinnier 
Wilson at the University of Cambridge, UK, 
published an ‘Indo-Sumerian’ decipher- 
ment, based on a comparison of the Indus 
signs with similar-looking ones in cunei- 
form accounting tablets from Mesopotamia. 


THREE PROBLEMS 

In the 1990s and after, many Indian authors 
— including some academics — have 
claimed that the Indus script can be read ina 
form of early Sanskrit, the ancestral language 
of most north Indian languages including 
Hindi. In doing so, they support the con- 
troversial views of India’s Hindu nationalist 
politicians that there has been a continuous, 
Sanskrit-speaking, Indian identity since the 
third millennium Bc. 

Whatever their differences, all Indus 
researchers agree that there is no consensus 
on the meaning of the script. There are three 
main problems. First, no firm information 
is available about its underlying language. 
Was this an ancestor of Sanskrit or Dra- 
vidian, or of some other Indian language 
family, such as Munda, or was it a language 
that has disappeared? Linear B was deci- 
phered because the tablets turned out to be 
in an archaic form of Greek; Mayan glyphs 
because Mayan languages are still spoken. 
Second, no names of Indus rulers or per- 
sonages are known from myths or histori- 
cal records: no equivalents of Rameses or 
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Ptolemy, who were known to hieroglyphic 
decipherers from records of ancient Egypt 
available in Greek. 

Third, there is, as yet, no Indus bilingual 
inscription comparable to the Rosetta Stone 
(written in Egyptian and Greek). It is con- 
ceivable that such a treasure may exist in 
Mesopotamia, given its trade links with the 
Indus civilization. The Mayan decipherment 
started in 1876 using a sixteenth-century 
Spanish manuscript that recorded a discus- 
sion in colonial Yucatan between a Spanish 
priest and a Yucatec Mayan-speaking elder 
about ancient Mayan writing. 


WHAT WE KNOW 
Indus scholars have achieved much in 
recent decades. A superb three-volume 
photographic corpus’ of Indus inscriptions, 
edited by the indefatigable Asko Parpola, an 
Indologist at the University of Helsinki, was 
published between 1987 and 2010 with the 
support of the United Nations Educational, 
Scientific and Cultural Organization; a 
fourth and final volume is still to come. The 
direction of writing — chiefly right to left 
— has been established by analysis of the 
positioning of groups of characters in many 
differing inscriptions. The segmentation 
of texts containing repeated sequences of 
characters, syntactic structures, the numeral 
system and the measuring system are partly 
understood. 

Views vary on how many signs there are 


ANCIENT ART AND ARCHITECTURE 
COLLECTION/BRIDGEMAN IMAGES 


\ 
WHERE UNICORNS ROAMED 


AFGHAN 


Mohenjo-daro and Harappa, the two 5. 
largest Indus cities, boasted complex 
street planning and house drainage. 


in the Indus script. In 1982, archaeologist 
Shikaripura Ranganatha Rao published 
a Sanskrit-based decipherment with just 
62 signs’. Parpola put’ the number at about 
425 in 1994 — an estimate supported by 
the leading Indus script researcher in 
India, Iravatham Mahadevan. At the other 
extreme is an implausibly high estimate® 
of 958 signs, published this year by Bryan 
Wells, arising from his PhD at Harvard Uni- 
versity in Cambridge, Massachusetts. 

Nevertheless, almost every researcher 
accepts that the script contains too many 
signs to be either an alphabet or a syllabary 
(in which signs represent syllables), like 
Linear B. It is probably a logo-syllabic script 
— such as Sumerian cuneiform or Mayan 
glyphs — that is, a mixture of hundreds of 
logographic signs representing words and 
concepts, such as &, £ and %, and a much 
smaller subset representing syllables. 

As for the language, the balance of evi- 
dence favours a proto-Dravidian language, 
not Sanskrit. Many scholars have proposed 
plausible Dravidian meanings for a few 
groups of characters based on Old Tamil, 
although none of these ‘translations’ has 
gained universal acceptance. 

A minority of researchers query whether 
the Indus script was capable of expressing 
a spoken language, mainly because of the 
brevity of inscriptions. The carvings aver- 
age five characters per text, and the long- 
est has only 26. In 2004, historian Steve 
Farmer, computational linguist Richard 


Between about 2600 sc and 1900 ac, more than a thousand settlements of the Indus civilization, including 
at least five cities, covered at least 800,000 square kilometres. Only 10% of sites have been excavated, 
partly because many lie near the tense border between Pakistan and India. 
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Ganweriwala was 
discovered in the 1970s, 
but remains unexcavated. 
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~ Extent of Indus valley region 
~~~ Disputed territory 


Sproat (now a research scientist at Google) 
and Sanskrit researcher Michael Witzel 
at Harvard University caused a stir with 
a joint paper’ comparing the Indus script 
with a system of non-phonetic symbols akin 
to those of medieval European heraldry or 
the Neolithic Vinéa culture from central and 
southeastern Europe’. 

This theory seems unlikely, for various 
reasons. Notably, sequential ordering and 
an agreed direction of writing are universal 
features of writing systems. Such rules are 
not crucial in symbolic systems. Moreover, 
the Indus civilization must have been well 
aware through its trade links of how cunei- 
form functioned as a full writing system. 

Nevertheless, the brevity of Indus texts 
may indeed suggest that it represented only 
limited aspects ofan Indus language. This is 
true of the earliest, proto-cuneiform, writing 
on clay tablets from Mesopotamia, around 
3300 Bc, where the symbols record only 
calculations with various products (such as 
barley) and the names of officials. 


DIGITAL APPROACH 

The dissident paper has stimulated some 
fresh approaches. Wells — a vehement 
believer that the Indus script is a full writing 
system — working with the geoinformation 
scientist Andreas Fuls at the Technical Uni- 
versity of Berlin, has created the first, publicly 
available, electronic corpus of Indus texts 
(see www.archaeoastronomie.de). Although 
not complete, it includes all the texts 


from the US-led Harappa Archaeological 
Research Project. 

A group led by computer scientist Rajesh 
Rao at the University of Washington in Seat- 
tle has demonstrated the potential of a digital 
approach. The team has calculated the con- 
ditional entropies — that is, the amount of 
randomness in the choice of a token (char- 
acter or word) given a preceding token — in 
natural-language scripts, such as Sumerian 
cuneiform and the English alphabet, and in 
non-linguistic systems, such as the computer 
programming language Fortran and human 
DNA. The conditional entropies of the Indus 
script seem to be most similar to those of 
Sumerian cuneiform. “Our results increase 
the probability that the script represents 
language,’ the Rao group has written’. Sproat 
strongly disagrees”. 

On the ground in Pakistan and India, 
more inscriptions continue to be discovered 
— although not, as yet, any texts longer than 
26 characters. Unfortunately, less than 10% of 
the known Indus sites have been excavated. 
The difficulty — apart from funding — is the 
politically troubled nature of the region. Many 
of the most promising unexcavated sites lie 
in the Pakistani desert region of Cholistan 
near the tense border with India. One such 
is the city of Ganweriwala, discovered in the 
1970s and apparently comparable in size with 
Mohenjo-daro and Harappa. 

If these sites, and some others within Paki- 
stan and India, were to be excavated, there 
seems a reasonable prospect of a widely 
accepted, if incomplete, decipherment of 
the Indus script. It took more than a cen- 
tury to decipher the less challenging Mayan 
script, following several false starts, hiatuses 
and extensive excavation throughout the 
twentieth century. Indus-script decipherers 
have been on the much barer trail — older 
by two millennia — for less than a century, 
and excavation of Indus sites in Pakistan has 
stagnated in recent decades. m 


Andrew Robinson is a science writer 
based in London. He is the author of Lost 
Languages: The Enigma of the World’s 
Undeciphered Scripts and, most recently, 
The Indus: Lost Civilizations. 

e-mail: andrew.robinson33@virgin.net 
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A health nurse in Ghana washes her hands before examining a baby. 


BACTERIOLOGY 


Pathogens in perspective 


Andrew Jermy travels with Hugh Pennington on the arc of humanity’s long, troubled 
relationship with microorganisms. 


efore opening Hugh Pennington’s 
Be: Bacteria Won?, readers of news- 
paper headlines might presume that 
his answer is ‘yes. But for the most part, the 
eminent bacteriologist comes to the opposite 
conclusion in this thought-provoking study 
that documents the history of human interac- 
tions with infectious disease and how current 
fears of impending doom have developed. 
Pennington draws on personal experi- 
ence and illuminating case studies — such 
as the United Kingdom's experience with 
bovine spongiform encephalopathy (BSE) 
and variant Creutzfeldt—-Jakob disease — to 
show how the public perception and clinical 
reality of infectious disease can be at odds. 
He upbraids researchers, journalists and edi- 
tors (such as me) for using the hyperbolic 
language of war (fight, struggle, arms race) 
to describe our relationship with the micro- 
organisms that colonize and infect our bod- 
ies. Such language injects drama and elevates 
the importance of events, much more than 
dry but accurate descriptions of the conse- 
quences of interactions between microbe, 
host, immune response and treatment. 
Asa result, the fear of microbial life in 


the collective mind 
is often vastly out of 
proportion to the risk. 
As Pennington puts 
it, “the media behave 
like a cheap refracting 
telescope, focusing on 
an object of interest 


but magnifying it with : 
8 8 Have Bacteria 
a good deal ofaberra- wo. 
tion and fuzziness at jay PENNINGTON 
the edges”. Witness Polity: 2015 


the media hysteria in 

2014 when US nurse Kaci Hickox returned 
to Maine from Sierra Leone after working 
with Médecins Sans Frontiéres (also known 
as Doctors Without Borders). Hickox was 
wrongly suspected of infection with Ebola, 
and her return set in train legal proceedings 
relating to her quarantine. 

Oddly, Pennington then fails to heed 
his own critique about rhetoric. He tours 
some of our “victories” (against smallpox, 
diphtheria and syphilis); the “advance” of 
microorganisms such as Escherichia coli and 
MRSA — methicillin-resistant Staphylococ- 
cus aureus — through horizontal transfer of 
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toxin-encoding genes or selection for anti- 
biotic resistance; and the “battles” in which 
human actions have helped microorganisms 
(including Salmonella and the organisms 
that cause anthrax and legionnaires’ disease). 
He deftly weaves historical vignettes into the 
greater journey. These include early efforts 
to control smallpox in the eighteenth cen- 
tury, led by Lady Mary Wortley Montagu, US 
minister Cotton Mather and latterly Edward 
Jenner; the benefits of improved water avail- 
ability (originally intended to support trade 
and fight fire); sanitation, diet and pasteuri- 
zation in the nineteenth century; and on to 
the discovery of antibiotics in the twentieth 
century, right up to outbreaks of carbapenem- 
resistant Enterobacteriaceae and severe acute 
respiratory syndrome (SARS) coronavirus in 
the modern era. The arc of that story and Pen- 
nington’s accessible prose grip throughout. 
Pennington points out that infection with 
antibiotic-resistant bacteria is not new. It 
has followed closely in the footsteps of all 
antibiotics since penicillin — discovered by 
Alexander Fleming in 1928 — was devel- 
oped as a treatment by Howard Florey and 
Ernst Chain. In another bout of debunking, 


NYANI QUARMYNE/PANOS PICTURES 


Pennington argues that predictions of a 
coming antibiotics Armageddon leading to 
a substantial increase in infection-related 
deaths are greatly exaggerated. On this point, 
I take a more cautious line. It is true that 
careful management and aseptic technique 
can have an important role in husbanding a 
dwindling supply of drugs effective against 
the most serious infections. However, Pen- 
nington does not devote sufficient space to 
the factors that have led the antibiotic-devel- 
opment pipeline to dry up in recent years. 

He almost trivializes the difficulties in iden- 
tifying relevant natural products or chemical 
constructs and developing them into usable 
drugs, simply writing: “New antimicrobials 
will be very welcome. Getting them ready for 
rollout will be expensive and will take years.” 
And he skates over structural problems in the 
pharmaceutical industry: we urge pharma to 
develop antimicrobials while simultaneously 
planning to limit their use drastically. In the 
twentieth century, drugs came along in time 
to take over when resistance arose; whether 
that will be the future pattern is uncertain. 

Pennington also skimps on coverage of 
microscopic eukaryotic pathogens, such as 
the malarial parasite Plasmodium falciparum 
or the fungi that cause cryptococcal meningi- 
tis. His only mention of malaria is in relation 
to Nobel-prizewinning Austrian physician 
Julius Wagner-Jaurege’s use of Plasmodium 
infection as an experimental antimicrobial 
agent to trigger the inflammation necessary 
to kill Treponema, the spirochaete that causes 
syphilis. Yet malaria currently kills more 
than 500,000 people a year, and the spread in 
southeast Asia of resistance to the only effec- 
tive antimalarials is of global concern. 

As Pennington admits, Have Bacteria 
Won? is intentionally biased by his personal 
experience as an infectious-disease specialist 
working in the United Kingdom. It would be 
unreasonable to expect comprehensive cover- 
age in an overview for the generalist. But he 
could have better explored the idea that devel- 
oped countries are over-fearful about infec- 
tious diseases, whereas developing nations 
— struggling with poor sanitation and inad- 
equate clean water, nutrition and health care 
—are at greater, and globally significant, risk. 

The book's title notwithstanding, Penning- 
ton extends his analyses to diseases caused 
by viruses, prions and eukaryotic parasites. 
Microbiologists grind their teeth when a well- 
intentioned news report refers to a bacterial 
infection as caused by a virus, or vice versa, 
so why sow more confusion? However, these 
few concerns do not detract from what is an 
entertaining and very well-written primer on 
the human-microbe relationship — one of 
the oldest pairings on Earth. = 


Andrew Jermy is chief editor of Nature 
Microbiology. 
e-mail: a.jermy@nature.com 
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Books in brief 


Prof: Alan Turing Decoded 

Dermot Turing THE HISTORY PRESS (2015) 

Computing pioneer Alan Turing has been justly and amply lauded, 
not least in these pages (see Nature 515, 195-196 (2014) and nature. 
com/turing). Now, a biography by his nephew, Dermot Turing, offers 
new sources and a refreshingly familial tone. We see the stubbornly 
original young Alan finding his way in maths and society; mentors 
such as Max Newman, who kick-started Turing’s obsession with 
machines; the design of the electromechanical ‘bombe’ that helped 
to crack the Enigma cipher; and a sensitive reappraisal of Turing’s 
suspected suicide. A measured portrait at ease with its subject. 


The Cabaret of Plants: Botany and the Imagination 

Richard Mabey PROFILE (2015) 

As acelebrant of the botanical, Richard Mabey has few peers. He is 
on eloquent form in this portrayal of plants not as dully functional 
components of natural capital — a “biological proletariat” — but 

as unruly, autonomous and endlessly fascinating. This engaging 
scientific and cultural tour takes in ice-age engravings of plant forms; 
ancients and giants such as bristlecone pines and baobabs; the vast 
biodiversity of maize (corn); and, as touched on by plant scientist lan 
Baldwin (Nature 522, 282-283; 2015), Erasmus Darwin’s discovery 
of “irritability” in Mimosa pudica more than 200 years ago. 


Moonstruck: How Lunar Cycles Affect Life 

Ernest Naylor OXFORD UNIVERSITY PRESS (2015) 

Circadian rhythms are dictated by sunlight and stitched into our 
genes. But what of the impact of moonlight on life? Marine biologist 
Ernest Naylor reveals that behavioural patterns linked to lunar phases 
have been found in animals such as the sea louse Eurydice. He also 
examines Moon-related spawning behaviour in marine species such 
as grunion and horseshoe crabs, and the sooty tern (Onychoprion 
fuscatus), with its breeding cycle of ten lunar months. For context, 
Naylor gives us the “full Moon”: the deep history, classical science and 
myth surrounding Earth’s beautiful, enigmatic satellite. 


The Unknown Universe: What We Don’t Know About Time and 
Space in Ten Chapters 

Stuart Clark HEAD oF ZEUS (2015) 

It is no revelation that some data on the early Universe sit uneasily 
with the standard model of cosmology. But in his clued-up overview, 
astronomy journalist Stuart Clark’s picture of the yawning gaps in 
our understanding of the cosmos is fuller than most. Clark tacks back 
and forth in the history of astronomy, intertwining the discoveries 
and theories of luminaries from astronomer William Herschel to 
cosmologist Roger Penrose with speculation on prevailing mysteries 
such as the nature of dark matter, dark energy and space-time. 


A Foot in the River: Why Our Lives Change — and the Limits of 
Evolution 

Felipe Ferndndez-Armesto OXFORD UNIVERSITY PRESS (2015) 

Cultural evolutionists paint a partial picture of the speed of change 
in human cultures, argues historian Felipe Fernandez-Armesto. His 
study, springing from a conference sponsored by the Templeton 
Foundation, calls for a new interdisciplinarity. He argues that 
although human culture is born of evolution, it also “changes 
independently of evolution” because it is a “projection of the human 
mind” — and its prodigious imaginative capacity. Barbara Kiser 
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Correspondence 


US environmentalists 
must turn out to vote 


Nico Stehr rightly argues that 
democracy is crucial in the 

fight against global warming, 
attributing the inadequate 
response of most democracies 

to an overall lack of public 
engagement (Nature 525, 
449-450; 2015). Our findings at 
the Environmental Voter Project 
indicate that a contributory factor 
could bea lamentably low turnout 
by environmentalist voters. 

The Environmental Voter 
Project is a non-partisan, 
non-profit organization 
(www.environmentalvoter. 
org). We estimate that there 
could be almost 16 million 
environmentalists in the United 
States who rarely or never vote 
— around 7% of the country’s 
voting-eligible population. 

To arrive at this figure, we 
brought in data analysts who 
used a data-rich voter file to 
create a national predictive 
modelling survey to identify 
people with a very high 
likelihood of believing that 
climate change is both human- 
induced and a crucial issue. 
They then used public voter 
files to determine the number in 
this group who failed to vote in 
national mid-term elections. 

Until many more 
environmentalists vote, US 
politicians at least are unlikely 
to give environmental issues the 
attention they so badly need. 
Nathaniel Stinnett 
Environmental Voter Project, 
Boston, Massachusetts, USA. 
nathaniel@environmentalvoter.org 


Dutch government 
appeals climate law 


The Dutch government lodged 
an appeal last month against 
The Hague District Court's 
ruling on 24 June that required 
it to make more-drastic cuts to 
the country’s greenhouse-gas 
emissions (see K. Purnhagen 
Nature 523, 410; 2015). The 
government's appeal seems to 


be buying time while the courts 
decide, which demonstrates the 
weakness of using lawsuits as a 
policy tool for climate change. 

The climate law has been 
hailed as marking a new era of 
environmental activism that 
could spark similar cases in other 
countries. Critics have warned 
against undue politicization 
of the judiciary, which could 
inhibit nations from entering 
into binding international 
commitments. 

The Dutch appeal is likely to be 
based on the government's right 
to determine policy, on whether 
the Kyoto Protocol can have 
such far-reaching effects and on 
the way the District Court has 
defined the state's duty of care. 
Both parties intend to take the 
case to the Dutch Supreme Court, 
which could take several years. 

Meanwhile, the government 
is also awaiting the outcome of 
several studies before launching 
any policy proposals. This will 
not be until next summer at the 
earliest, so the general elections 
in March 2017 could offer a 
faster and more effective means 
of bringing about policy change. 
Hanna Schebesta Wageningen 
University, the Netherlands; and 
European University Institute, 
Florence, Italy. 

Kai Purnhagen Wageningen 
University; and Erasmus 
University Rotterdam, the 
Netherlands. 
kai.purnhagen@wur.nl 


Interdisciplinarity: 
less vague please 


The term ‘interdisciplinarity’ 
is used to cover a diversity of 
practices (see Nature 525, 305; 
2015). What is crucial for one 
kind of interdisciplinarity may 
be immaterial to another. 
Without specificity and 
differentiation, it is impossible 
to identify factors essential 
for success. Relevant features 
include the nature of the 
problem under investigation; the 
number of disciplines involved; 
whether these are closely 
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aligned or disparate; whether 
the interdisciplinary research is 
undertaken by an individual ora 
team; and whether it is engaged 
with policy and end-user practice 
(see go.nature.com/ujwu8g). 
linvestigated one category 
of interdisciplinary research, in 
which experts from multiple, 
diverse disciplines work with 
end-users on topical problems, 
to determine the specialist 
skills required (see go.nature. 
com/nnsnsx). Synthesizing 
knowledge, managing remaining 
unknowns and supporting 
policy, practice and technological 
change are all essential. Each of 
these skills encompasses an array 
of concepts and methods. 

A new ‘interdisciplinary’ 
discipline such as ‘integration 
and implementation sciences’ 
can capture, assess and transmit 
these skills. It could build a 
college of peer reviewers to 
improve quality and raise 
the visibility and influence of 
interdisciplinarity. 

Gabriele Bammer Australian 
National University, Acton, 
Australia. 
gabriele.bammer@anu.edu.au 


Interdisciplinarity: 
resources abound 


There is growing international 
consensus on best practice in 
interdisciplinary research (see 
Nature 525, 305; 2015). This has 
been spurred by various online 
initiatives. 
Transdisciplinarity-net, 
sponsored by the Swiss 
Academies of Arts and 
Sciences, offers a toolkit of 
useful research strategies (see 
www.transdisciplinarity.ch/ 
toolbox). The Association 
for Interdisciplinary Studies 
provides many resources, 
including an ‘About 
interdisciplinarity’ section that 
outlines definitions and best 
practices (see wwwp.oakland. 
edu/ais). A set of useful short 
guides is also available (see 
go.nature.com/faclve) and the 
Australian 12S site for integration 


and implementation sciences 
provides detailed resources (see 
i2s.anu.edu.au). The Science 

of Team Science initiative 
sponsored by the US National 
Cancer Institute addresses 

the particular challenges of 
conducting research in teams 
(see www.teamsciencetoolkit. 
cancer.gov). 

Notable among the many 
books on the topic are Methods for 
Transdisciplinary Research (Univ. 
Chicago Press, 2013) by Matthias 
Bergmann and colleagues and 
Interdisciplinary Research (Sage, 
2011) by Allen Repko. 

Because interdisciplinarity 
is still an emerging approach, 
such recommendations need 
reviewing and updating 
regularly if its potential is to 
be realized — by those who do 
interdisciplinary research and by 
those who study its progress. 
Rick Szostak University of 
Alberta, Edmonton, Canada. 
rszostak@ualberta.ca 


Deposited grants 
buy time in Brazil 


Academics who are paralysed 

by Brazil's political and financial 
crisis should take heart (see 
Nature 526, 16-17; 2015). 

Funds approved for 2014 by the 
National Council for Scientific 
and Technological Development, 
the country’s most important 
funding agency, were fully 
deposited and are available to 
principal investigators until 2016 
or, in some cases, 2017. 

It is crucial, however, that 
these resources are managed and 
used wisely. Coordinators must 
honour their original funding 
agreements for designated 
projects. 

These guarantees would 
buy enough time for President 
Dilma Rousseff to help to restore 
Brazil's long-standing record 
of growing and consistent 
investment in research. 

Joao Ricardo Mendes de 
Oliveira Federal University of 
Pernambuco, Recife, Brazil. 
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Turbulence spreads like wildfire 


Asimple model captures the key features of the transition from smooth to turbulent flow for a fluid in a pipe. The findings 
pave the way for more-complex models and may have engineering ramifications. SEE LETTER P.550 


MICHAEL D. GRAHAM 


he flow of air over the wing of an 

aeroplane is smooth and steady at low 

speeds, as if thin layers of air were slid- 
ing over one another. At higher speeds, this 
laminar flow becomes turbulent — the steady 
flow gives way to fluctuating eddies that hin- 
der the motion of the aircraft through the air. 
The processes that drive these fluctuations 
become self-sustaining during the transi- 
tion from laminar to turbulent flow, so this 
transition is a window into the origins of fully 
developed turbulence. On page 550 of this 
issue, Barkley et al.’ present an experimentally 
validated model of flows in pipes that captures 
the main features of the transition to the tur- 
bulent regime, and illuminates the onset of 
widespread turbulence. 

Every flow is characterized by a dimen- 
sionless quantity, known as the Reynolds 
number, that measures the importance of a 
fluid’s momentum relative to its viscosity. The 
Reynolds number is low if the fluid’s motion is 
slow, or the spatial extent of the flow is small, 
or the fluid has large viscosity — any pertur- 
bations to the flow will then soon disappear. 
Fora fluid that flows in a pipe, turbulence first 
occurs when the Reynolds number exceeds 
about 2,000: a localized perturbation, such as 


a_ Laminar flow 


Pipe 


a jet issuing from the pipe’s wall, will evolve 
into a turbulent puff, a localized patch of tur- 
bulence that travels downstream surrounded 
by regions of laminar flow (Fig. 1). At higher 
Reynolds numbers, an initially localized per- 
turbation evolves instead into a turbulent 
pattern that spreads both upstream and down- 
stream relative to the average speed of the fluid 
in the pipe, leading to what Barkley et al. call 
the “rise of fully turbulent flow”. 

Travelling localized patterns arise in many 
contexts” — in forest fires for example. A 
forest can be described as bistable: that is, it 
has two steady states, green or burned. A fire 
is a propagating front that connects these states 
in space and time by heating up the trees in 
its path until they catch fire. A large perturba- 
tion like a lightning strike is needed to start the 
fire, but once the fire spreads, the burned forest 
cannot return to the green state. 

A more intricate case is an excitable 
medium suchas a nerve fibre’. Here, only one 
steady state exists: the nerve’s relaxed state. 
Nevertheless, large enough perturbations 
can drive the nerve fibre to a quasi-steady 
‘excited’ state that can persist for some time 
before returning to the steady state. In an 
excitable medium, localized perturbations 
spread by exciting neighbouring regions, and 
lead to excitation pulses that are surrounded 


b Transitional flow 


by the relaxed state as they travel through the 
medium. 

A mathematical model for patterns in 
bistable and excitable systems needs two ingre- 
dients: one to describe the possible dynamical 
states at each point in the domain (for example, 
the presence of two steady states in the bistable 
case), and the other to describe the communi- 
cation or transport between points, which is 
usually modelled as a simple diffusion process. 
Although they arise in many contexts, models 
that combine these ingredients are generally 
called reaction-diffusion models because of 
the presence of bistability and excitability in 
chemical reaction processes. These models 
can reproduce simple propagating fronts or 
pulses, as well as complex patterns such as 
spiral waves. 

Because the transition ofa flow to a turbulent 
state involves travelling fronts, reaction—diffu- 
sion models provide a natural starting point for 
descriptions of this transition’. A turbulent flow 
is dynamically more complex than a system in 
a steady or quasi-steady state, but the regime 
close to the turbulent transition contains sim- 
ple dynamical states that are akin to turbulent 
features such as small-scale vortices”®. Dif- 
fusion provides some transport of the 
fluctuations in turbulent flows, but the main 
transport mechanism is advection, in which the 
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Figure 1 | Pipe flow. Barkley et al.’ present a mathematical model that 
captures the evolving characteristics of the flow (arrows) ofa fluid in a pipe. 
a, In the laminar regime of smooth and steady flow, externally introduced 
perturbations soon disappear. b, In the transitional regime, perturbations 
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evolve into puffs — turbulent regions whose upstream and downstream 
fronts propagate at the same speed. The puffs travel without spreading. c, In 
the turbulent regime, a localized perturbation spreads until it overruns the 
laminar flow that surrounds it. 


turbulent flow is carried downstream through 
bulk motion. 

Barkley et al. introduce the directionality 
imparted by advection into a reaction-dif- 
fusion model, and use the new model to 
reproduce in remarkable detail their observa- 
tions of flows in pipes. In this model there is 
a laminar steady state that is stable to small 
perturbations at all Reynolds numbers. How- 
ever, once the Reynolds number exceeds a 
threshold value, the system becomes excit- 
able, so that large perturbations of the laminar 
state evolve into stable localized puffs that are 
analogous to nerve impulses. As the Reynolds 
number increases further, the system becomes 
bistable: initially localized turbulent patches 
start spreading, first weakly and then strongly, 
invading the laminar regions until those 
‘ignite’ into turbulence. 

Despite having a simple structure, this 
model quantitatively reproduces the veloci- 
ties of fronts propagating both upstream and 
downstream across the Reynolds-number 
regime that corresponds to the transition 
from laminar to turbulent flow. Barkley 
and colleagues thus demonstrate that sim- 
ple advection-enhanced reaction-diffusion 
models can capture the large-scale charac- 
teristics of a pipe flow’s transition regime. 
This accomplishment highlights the impor- 
tance of a dynamics-based approach to the 
understanding of turbulence’. 

The present model has certain limitations 
that could be addressed in the future. A key 
simplification is the representation of tur- 
bulence by a quasi-steady or steady state. 
This precludes the model from capturing the 
internal time and length scales of the fluctua- 
tions, which vary between a turbulent region's 
interior and its edges. Models that can capture 
more of the dynamics in the turbulent regions 
could better follow the evolution of turbulence 
in space and time, and might further elucidate 
the transition process, including the presence 
of intermittent laminar flows surrounded by 
turbulence. Furthermore, the transition and 
turbulent regimes in polymer solutions’ or 
in fluids that contain small particles* can be 
substantially different from the simple fluids 
described in the current work; extending the 
experiments and the model to such systems 
would be of broad interest. 

Finally, Barkley and colleagues’ study is 
restricted to flows that have only one extended 
spatial dimension, in which the travelling pat- 
terns are simple pulses. Flows in wide channels 
or over aircraft wings have an extra extended 
dimension: in such cases, the transition to the 
turbulent regime gives rise to complex patterns 
such as diamond-shaped spots and stripes that 
are oriented obliquely relative to the main flow 
direction”””. It would be valuable to develop 
models for such flows and for various types of 
perturbations that can trigger turbulent transi- 
tions in fluids. This would be of substantial inter- 
est to engineers who would like to reduce drag 


in flows (for example, air drag over wings) by 
controlling the transition process with suitably 
designed structures. After all, the best control 
algorithms are built around a mathematical 
model of the process to be controlled. m 
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Chronic effects 
of acute infections 


Acute infection of mice with an intestinal pathogen leads to long-lasting 
inflammation that is maintained by intestinal microorganisms. This observation 
reveals a path by which infection history can affect long-term immune function. 


NICOLA HARRIS 


ur bodies’ history of infections shapes 

our immune system and can influence 

the development of subsequent dis- 
eases, including inflammatory bowel disease 
and autoimmune disorders’. Ithas also been 
postulated’ that individuals’ past infections can 
undermine vaccine programmes, particularly 
in developing nations. For certain cases, such 
as infection with Streptococcus pyogenes bacte- 
ria and rheumatic heart disease, this link can 


Inflamed 
mesentery 


Animal recovered 
from intestinal infection 


Inflamed mesenteric 
lymph node 


Intestine 


be explained by the presence of similar anti- 
gens (proteins against which the immune sys- 
tem reacts) in both the pathogen and the host’. 
Writing in Cell, Morais da Fonseca et al.* map 
a different pathway by which infections alter 
immune status. The authors observe that mice 
infected with the common intestinal bacterial 
pathogen Yersinia pseudotuberculosis have an 
altered long-term ability to react to experi- 
mental antigens that mimic human exposure 
to food or oral vaccines. 

Morais da Fonseca and colleagues show 


Leaky 


Dendritic cell lymphatic vessel 


Impaired immune 
responses to 
intestinal antigens 


Figure 1 | Leaky lymphatics. Lymphatic vessels in the gut’s mesenteric adipose tissue carry dendritic 
cells (DCs) of the immune system from the intestine to the mesenteric lymph nodes. There, the DCs 

play a key part in initiating intestinal immune responses, including those that allow the body to 

tolerate ‘foreign’ substances (antigens) from food or to respond to oral vaccines. Morais de Fonseca 

et al.* show that mice that have recovered from an acute intestinal infection with the bacterium Yersina 
pseudotuberculosis exhibit persistent inflammation in the mesenteric adipose tissue that results in ‘leaky’ 
lymphatics and the loss of migrating DCs into the surrounding tissue. As a consequence, the animals have 


impaired intestinal immune responses. 
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that Y. pseudotuberculosis infection results in 
an acute inflammatory episode in the intes- 
tine and its associated tissues. However, the 
surprise came with their observation that, 
well after the mice had cleared the pathogen, 
they continued to have swollen lymph nodes 
(lymphadenopathy) and inflamed mesentery 
— a fold of tissue that connects the intestine to 
the body wall. The mesentery is rich in adipo- 
cytes (fat cells) and contains lymphatic vessels 
(which carry fluid and cells from the tissues 
to the lymph nodes) through which special- 
ized immune cells called dendritic cells (DCs) 
travel. Although DC migration through mes- 
enteric lymphatic vessels occurs uninterrupted 
in uninfected mice, the researchers observed 
that mice that had recovered from Y. pseudo- 
tuberculosis infection had ‘leaky’ lymphatic 
vessels, resulting in the premature exit of DCs 
and fluids (Fig. 1). This means that these cells 
failed to complete their journey to the mesen- 
teric lymph nodes, where they are essential for 
initiating immune responses. 

To examine the consequences of persistent 
lymphatic leakiness in such mice, the authors 
investigated how these animals responded to 
previously unencountered intestinal antigens. 
Using experimental models that mimic oral 
vaccination or food tolerance (a process by 
which the immune system prevents reactivity 
to antigens found in the diet), they found that 
both types of response were compromised in 
mice that had previously had an acute infection 
with Y. pseudotuberculosis. 

The researchers also observed increased 
levels of IL-18 and TNFa in the mesen- 
teric adipose tissue following recovery from 
Y. pseudotuberculosis infection. These inflam- 
matory cytokines (proteins involved in 
intercellular communication) are known to 
promote lymphatic leakiness*. Using germ- 
free mice, which lack normal resident micro- 
bial populations, Morais da Fonseca et al. 
demonstrate that the intestinal microbiota is 
necessary for the persistent inflammation and 
lymphatic leakiness after acute infection. This 
observation prompted the authors to examine 
whether antibiotic treatment might reverse 
the negative effects of Y. pseudotuberculosis on 
the host’s immune status. Indeed, they found 
that a short course of antibiotics following 
recovery from Y. pseudotuberculosis infection 
reduced ongoing mesenteric inflammation 
and restored immune responsiveness to oral 
vaccination. 

Why and how the intestinal microbiota 
drives continued mesenteric inflammation 
after infection has cleared awaits future study. 
Because mice infected with Y. pseudotubercu- 
losis showed no gross long-term differences in 
the composition of their intestinal microbial 
communities compared with uninfected ani- 
mals, it is probable that inflammation resulted 
from an altered response of the host to the 
microbiota. In healthy animals, microbial 
communities are restricted to the intestinal 


lumen. However, a variety of injuries or 
environmental stressors can allow increased 
numbers of bacteria to cross the intestinal 
barriers and enter the underlying tissues (a 
phenomenon called bacterial translocation). 
Although not experimentally addressed by 
the researchers, it is possible that Y. pseudo- 
tuberculosis infection causes long-term 
changes in intestinal physiology that result 
in chronic low-level bacterial translocation 
and entry of these cells into the mesenteric 
lymphatics. People with inflammatory bowel 
disease (IBD) display increased bacterial 
translocation’, and Morais da Fonseca and col- 
leagues observed leaky lymphatics in a mouse 
model of IBD. Further investigation is needed 
to determine whether this process also occurs 
in humans, and to assess its possible impact on 
disease severity or immune status. 

This work emphasizes the idea that infection 
history can shape inflammation by disrupting 
fundamental pathways that are required to 
initiate immune responses. The unveiling of 
lymphatic leakiness as one such pathway is 
an advance in the field that is likely to prompt 
closer examination of this pathway in diseases 
that exhibit chronic inflammation, including 
IBD, autoimmune diseases and obesity. Might 
we be able to counter the effects of infec- 
tion history with interventions that improve 
lymphatic function or restore localized 
immune responsiveness? Such interventions 
could aim to resolve chronic inflammation, 
or to target the intestine to improve barrier 
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function and prevent bacterial translocation. 

Currently, infection history is determined 
one pathogen at a time, and is based on detect- 
ing pathogen-specific antibodies that are made 
by the immune system in response to pathogen 
infection and can persist for decades. How- 
ever, an exciting study this year’ has reported 
an antibody-based high-throughput method 
that uses a single drop of blood to document 
a person's previous exposure to more than 
200 viruses. Expansion of this technology to 
other pathogens (including bacteria, protozoa 
and fungi) would allow comparisons between 
infection history and health status, potentially 
uncovering links between specific pathogens 
and diseases. Eventually, the combination of 
high-throughput screening of infection history 
and targeted therapy might be used to prevent 
disease in ‘at risk’ individuals. = 
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Small glacier has big 
effect on sea-level rise 


Models of the West Antarctic Ice Sheet predict substantial ice loss over the next 
few centuries — and that a glacier expected to contribute greatly to sea-level rise 


may already be unstable. 


NATALYA GOMEZ 


ea-level rise is projected to displace 

communities around the world in the 

coming centuries. In 2013, the Inter- 
governmental Panel on Climate Change 
identified’ the potential runaway retreat of 
marine sectors of the West Antarctic Ice Sheet 
as a major source of uncertainty in predictions 
of sea-level rise; these sectors contain enough 
ice to raise average global sea levels by several 
metres. Writing in The Cryosphere, Cornford 
et al.” present modelling of the West Antarctic 
Ice Sheet’s response to a warming climate in 
the next 300 years. Their findings show the 
potential for substantial retreat, and identify 
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the Thwaites Glacier — a deep-rooted ‘outlet’ 
glacier in the Amundsen Sea Embayment 
thought to be on the verge of rapid retreat — 
as probably the largest potential source of sea- 
level rise over the next few centuries. 

Marine sectors of ice in the West Antarctic 
are kilometres thick and, in some places, sit on 
bedrock that lies more than a kilometre below 
the height of the sea surface. These ice sheets 
gain mass through snowfall from above, flow 
outwards under the influence of gravity, and 
lose mass mostly through fast-flowing streams 
of ice, called outlet glaciers, that feed into ice 
shelves floating in the surrounding ocean. The 
grounding line of a marine ice sheet is the zone 
around the edge of the sheet where the ice is 
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Figure 1 | The Antarctic ice sheet. Marine sectors of ice in Antarctica lose mass mostly through 
fast-flowing outlet glaciers (purple) that feed into ice shelves in the surrounding ocean. Two of these 
glaciers — the Thwaites Glacier and the Pine Island Glacier — feed into the Amundsen Sea. Cornford 
et al.” have modelled the West Antarctic Ice Sheet’s response to climate warming in the next 300 years. 
They find that the Thwaites Glacier could already be undergoing runaway retreat, and that it may bea 


substantial source of future, century-timescale sea-level rise. Figure adapted from ref. 12. 


just thin enough to float, separating grounded 
ice sitting on land in the interior from the 
floating ice shelves. 

In a warming climate, marine sectors of 
ice are thought to be particularly vulnerable. 
Runaway ice-sheet retreat associated with 
instability of the grounding line can occur 
when the bed of a marine outlet glacier deep- 
ens upstream of the grounding line’, as is the 
case for most of the major outlet glaciers in 
the West Antarctic’. The floating ice shelves 
on the periphery of these ice sheets stabilize 
the outlet glaciers, inhibiting the ocean-bound 
flow of grounded ice and slowing ice loss. But 
when the ocean warms, these buttressing ice 
shelves are melted from below and can break 
up, initiating faster ice flow and rapid retreat 
of the grounding line’. 

Cornford et al. predicted the impact of 
ongoing climate warming on the stability 
of the West Antarctic Ice Sheet by using the 
BISICLES ice-sheet model® with a sophisti- 
cated treatment of the grounding line, forced 
by a suite of the most recently available atmos- 
phere and ocean model projections”*. Marine 
ice sheets interact strongly with both the 
atmosphere and the ocean, and the compu- 
tational expense of coupling ice-sheet models 
to state-of-the-art climate models with a full 
range of ocean—atmosphere interactions is 
currently prohibitive. The approximate treat- 
ment adopted by the authors is among the best 
available methods with which to model this 
complex coupling. 

Simulating the migration of the grounding 


line requires very high spatial resolution, and 
this limits the spatial and temporal scales that 
regional ice-sheet models can consider. To 
clear this technical hurdle, Cornford and col- 
leagues used a numerical approach known as 
adaptive mesh refinement to focus in on the 
ice flow at the critical zone near the ice sheet’s 
grounding line. The authors’ treatment does 
not, however, take into account some of the 
factors that affect ice-sheet dynamics, such as 
changes in the elevation of Earth's solid surface 
beneath the ice, and depression of the local sea 
surface as the gravitational attraction of the ice 
sheet on the surrounding water weakens’. It 
also neglects some processes that take place at 
the ice-bed interface, which are challenging 
to observe. All of the above factors can change 
the timing and extent of ice-sheet retreat in 
some regions. But no existing ice-sheet model 
accounts for all of these effects. 

Cornford and colleagues’ simulations show 
that the grounding line retreats by hundreds 
of kilometres in all major marine outlet gla- 
ciers in West Antarctica when ice shelves are 
melted from below and break apart. However, 
recent projections using ocean-circulation 
models” indicate that ocean warming suffi- 
cient to break apart ice shelves in the coming 
decades will occur only in the Amundsen Sea 
Embayment, into which the Thwaites and Pine 
Island glaciers flow (Fig. 1). When the authors 
used these more-realistic projections of sub- 
ice-shelf melt rates to drive their simulations, 
they predicted a contribution of up to 50 mm 
of global average sea-level rise from the West 
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50 Years Ago 


There is an inborn fascination ...in 
the discovery and unearthing of 
relics of life as it existed centuries 
ago. None of these is perhaps 
more generally exciting and 
popular than the Roman mosaic 
pavements ... constructed floorings 
in regular cubes of stones of many 
colours, beautifully contrived in 
patterns and pictures... Proof 

of life as it existed in Yorkshire 
during A.D. 100-400, as evidenced 
by the discovery of two excellent 
examples of Roman mosaic 
handiwork, is furnished by the 
Rudstone pavement...and by one 
at Brantingham ... At Rudstone 
three mosaics were originally 
uncovered... and were ultimately 
removed to Hull Museums... It is 
indeed no boastful claim that“... 
these beautiful pavements are now 
permanently preserved for the 
benefit of posterity”. 

From Nature 23 October 1965 


100 Years Ago 


The autumn number of Bird Notes 
and News contains much readable 
matter in regard to the effect of 

the war on bird-life in France and 
Flanders. Swallows returning 

this spring to their accustomed 
nesting sites only too often found 
them reduced to a heap of ruined 
masonry. In such cases huts erected 
for military purposes have been 
adopted as substitutes. This fact 
shows the tenacity with which these 
birds cling to their old haunts. Birds 
roosting between the lines of the 
opposing forces have on more than 
one occasion given timely warning 
to the sleeping men of the near 
approach of poison gas fumes, by the 
rustle of their wings and low cries 

as they passed over our trenches. 
Except, indeed, when actually within 
the zone of fire the birds have shown 
themselves strangely indifferent to 
the strife around them. 

From Nature 21 October 1915 
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Antarctic Ice Sheet by 2100, and up to 150 mm 
by 2200, the majority of which derives from ice 
loss in the Amundsen Sea Embayment. 

In addition, their results highlight that the 
stability of the Thwaites Glacier in particular 
strongly depends on the initial modern state 
of the ice sheet and bedrock adopted in the 
simulations. Under a range of reasonable ini- 
tial conditions, the modelled Thwaites Glacier 
retreats immediately and rapidly, even without 
the added forcing of ocean warming breaking 
up the ice shelf. 

These results suggest that strong marine 
ice-sheet instability may already be under 
way on Thwaites, even without the help of ice- 
shelf break-up. This conclusion is supported 
by other recent investigations into the fate of 
the Antarctic Ice Sheet, notably a modelling 
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study’ of the Thwaites Glacier and observa- 
tion-based work’ on outlet glaciers in West 
Antarctica. The emerging picture highlights 
an urgent need for further observational and 
modelling explorations of the Amundsen Sea 
Embayment. Efforts to develop and compare 
models focused on the future of the Antarc- 
tic Ice Sheet in a warming climate are already 
under way (see ref. 11, for example). = 
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Kidney tissue grown 
from induced stem cells 


Engineered human cells that can give rise to every cell type have been induced to 
generate structures that resemble an embryonic kidney. This advance charts a 
course towards growing transplantable kidneys in culture. SEE LETTER P.564 


JAMIE A. DAVIES 


| : idney diseases are becoming increas- 

ingly common, and there is a shortage 

of transplantable organs with which to 
combat them. One solution is to build human 
kidneys from stem cells. But before this can 
be done, several difficult problems must be 
solved. On page 564 of this issue, Takasato et 
al.‘ report taking an important step towards 
building stem-cell-derived kidneys. 

The path from a stem cell to an engineered 
kidney involves multiple steps. First, stem cells 
must be persuaded to develop into kidney cells, 
rather than those of other tissues. Second, once 
committed to such specialization, the cells 
must be encouraged to build the intricate, 
complex anatomy of the kidney. Third, the 
cultured kidneys must be coaxed to grow and 
function in a host patient. 

Researchers have been making steady 
progress on the second and third of these steps 
since 1910, when kidney rudiments were first 
cultured in vitro”. Subsequent breakthroughs 
have enabled both the production of sus- 
pensions of fetal animal renogenic (kidney- 
creating) cells that, in culture, self-organize 
into small organs and arrangements of tissue 
called organoids* > , and the transplantation of 
fetal animal kidneys into adult animals®. But 
these advances are of little medical use without 
conquering the first step: finding a technique 


for producing renogenic cells, and the vascular 
progenitors that surround them, from healthy 
human tissues. 

One way to obtain renogenic cells is to grow 
them from induced pluripotent stem (iPS) 
cells — adult cells that have been converted 
in culture to a pluripotent state, from which 
they can become any cell type in the body’. 
During embryonic development, this process 
of specialization proceeds through inter- 
mediate cell types, and the transition from 
one stage to the next is triggered by specific 
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on DOD 


signalling proteins such as Wnts or fibroblast 
growth factors (FGFs). Therefore, biologists 
who want to direct iPS cells to develop into 
a particular tissue typically treat them either 
with signalling molecules or with drugs that 
mimic these molecules, in an embryologically 
inspired sequence. 

But designing effective protocols for iPS-cell 
development can be challenging — especially, 
for example, if the target tissue contains multi- 
ple cell types, or if it arises late in embryonic 
development, many stages away from the 
pluripotent state. Both of these are true of 
the kidney, which begins to develop when a 
human embryo is five weeks old, and which 
contains cells derived from at least two reno- 
genic progenitor cell types: the ureteric epithe- 
lium, which gives rise to collecting ducts that 
help to maintain the body’s balance of fluids 
and electrolytes, and the metanephrogenic 
mesenchyme, which matures into nephrons 
that mediate excretion. 

Given the urgent need for kidneys for 
transplantation, researchers have long been 
working to obtain renogenic cells from vari- 
ous types of animal and human pluripotent 


Transfer to Nephron 
3D culture » 
=> ae 
pulse 
Collecting 
duct 


Figure 1 | From stem cell to self-organizing tissue. Takasato et al.’ developed a protocol for growing 
organized kidney organ buds in vitro. Induced pluripotent stem (iPS) cells, which can give rise to any cell 
type, were exposed to signals from Wnt molecules for four days, and then to molecular FGF signals for 
five days. Wnt signalling produced a physiological balance between two kidney progenitor cell types: the 
metanephrogenic mesenchyme (MM) and the ureteric epithelium (UE). The authors transferred these 
cells to a 3D culture system and then exposed them to another pulse of Wnt signalling, which triggered 
further development — the cells differentiated and organized themselves into nephrons and pieces of 
collecting duct that resembled those in an embryonic human kidney. 
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stem cell. A pioneering study® demonstrated 
that pluripotent cells called mouse embryonic 
stem cells can be persuaded to express genetic 
markers of the metanephrogenic mesenchyme 
and to integrate into host kidneys, albeit with 
low efficiency. Since then, however, despite 
incremental improvements, efficiency has con- 
tinued to be a problem, and too many research- 
ers have relied on marker expression alone as 
an indicator of success. Yet marker expression 
is no guarantee that cells will produce a safe, 
functional tissue — as exemplified by cancer 
cells that express markers of the tissue from 
which they arose. 

Takasato et al. built on previous work to 
devise a protocol to efficiently turn human 
iPS cells into metanephrogenic mesenchyme 
and ureteric epithelium, with the correct bal- 
ance of cell types (Fig. 1). Their advance was 
made possible by an improved understanding 
of the embryonic origin of the two stem-cell 
types: in particular, the realization that cells 
that will give rise to ureteric epithelium are 
exposed only briefly to Wnt signals, whereas 
those that will give rise to metanephrogenic 
mesenchyme come from cells that have been 
exposed for longer. Their protocol thus opti- 
mizes the duration of iPS-cell exposure to 
Wnt-mimicking drugs to produce a balance 
between the two stem-cell types that approxi- 
mates the ratio seen in vivo. This is followed 
by exposure to FGF signals, as would occur in 
a human embryo. 

When the researchers cultured the cells as 
a 3D aggregate, and provided a second, ‘trig- 
ger’ Wnt signal, the metanephrogenic mes- 
enchymal cells developed into nephrons, and 
the ureteric epithelial cells became collecting 
ducts. The nephrons matured and produced 
a sequence of specialized segments that 
mimicked those in an embryo, along with 
the connective tissue and vascular progeni- 
tor cells that surround embryonic nephrons. 
Gene expression was comparable with that 
in first-trimester human fetal-kidney tissue. 
And the maturing nephrons took up labelled 
tracer molecules, suggesting that they are 
functional. 

It is vital to emphasize that the result of this 
process is not a kidney, but an organoid. The 
structure's fine-scale tissue organization is 
realistic, but it does not adopt the macro-scale 
organization of a whole kidney. For example, 
it is not ‘plumbed’ into a waste drain, and it 
lacks large-scale features that are crucial for 
kidney function, such as a urine-concentrat- 
ing medulla region containing mature forms 
of structures called loops of Henle and radi- 
ally arranged collecting ducts. There is a long 
way to go until clinically useful transplantable 
kidneys can be engineered, but Takasato and 
colleagues’ protocol is a valuable step in the 
right direction. 

Even so, these kidney organoids may fulfil 
a different medical need — the ability to test 
drug safety on human kidney tissue, rather 


than in poorly predictive animals’. The cell 
types that are most vulnerable to damage by 
drugs are present in the organoids, and the 
authors provide preliminary evidence to dem- 
onstrate that the system is indeed damaged 
by aknown renal toxin. It is to be hoped that 
Takasato et al. will team up with toxicologists 
to perform a full-scale study on the screening 
potential of their system. The result could be 
a major step towards animal replacement and 
improved safety screening for drugs, as well as 
towards transplantable kidneys. m 
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Conductive consortia 


Physiological analyses, electron microscopy and single-cell chemical imaging 
suggest that direct electron transfer occurs between the members of methane- 
oxidizing microbial consortia. SEE ARTICLE P.531 AND LETTER P.587 


MICHAEL WAGNER 


mission of the greenhouse gas methane 
Bie the seabed is controlled by its 

anaerobic oxidation coupled with sulfate 
reduction. This globally important process, 
which consumes most of the methane released 
and thus regulates the climate on our planet, 
is often mediated by dense aggregates of two 
specialized microorganisms: anaerobic meth- 
anotrophic archaea and deltaproteobacteria. 
These bi-species consortia were discovered 
about 15 years ago’, but how the partners make 
a living from this low-energy-yielding process 
has been a mystery, although it has repeatedly 
been speculated that exchange of a diffusible 
metabolite between the partner microbes is 
essential” *. Two papers in this issue challenge 
this hypothesis. McGlynn et al.° (page 531) 
demonstrate that the relationship between spe- 
cies intermixing and cellular activity patterns is 
inconsistent with transfer of a diffusible com- 
pound, and Wegener et al.° (page 587) show 
with physiological experiments that transfer 
of intermediates cannot explain the growth of 
the aggregates in culture. 

Within the consortia, the anaerobic metha- 
notrophic archaea (ANME) oxidize methane 
to carbon dioxide by reversing the classical 
pathway of methanogenesis’. This process is 
energetically favourable only if the resulting 
electrons are efficiently transferred to sulfate. 
Several hypotheses have been put forward 
to explain how this is achieved (Fig. 1). One 
theory proposes’ “ that electrons are trans- 
ferred from the archaea to their deltaproteo- 
bacterial sulfate-reducing partners through 


the production and consumption of a diffus- 
ible metabolite, such as hydrogen, formate or 
methanethiol. This model is consistent with 
long-known strategies for electron transfer 
between microbial species, but experimental 
evidence that it occurs in this system is lacking. 
Analternative model’, based on a wide array 
of experimental data, predicts that ANME 
can autonomously perform anaerobic meth- 
ane oxidation by reducing sulfate (SO,”-) to 
zero-valent sulfur (S°). This reacts with envi- 
ronmental sulfide to form sulfur compounds 
that are used by the deltaproteobacteria, which 
act as commensal organisms — they benefit 
from the relationship without affecting the 
archaea. Although this model can explain the 
frequently observed occurrence of ANME in 
the environment without deltaproteob acteria’, 
the enzymatic machinery for sulfate reduction 
in ANME has not yet been discovered. 
McGlynn et al. now use chemical imaging 
and isotope labelling to measure the meta- 
bolic activity of single cells in 62 consortia 
from deep-sea sediments at an active methane 
source, and show that cellular activity patterns 
are independent of the distance between part- 
ner cells. Furthermore, they find that the activ- 
ities of entire aggregates are not related to the 
spatial distribution of the microbial members. 
These findings are in marked contrast to 
theoretical predictions on activity patterns in 
multispecies consortia that are driven by the 
exchange of microbial metabolites. According 
to such predictions, interacting species in close 
contact with each other would be more active 
than those separated by greater distances, and 
thus well-intermixed aggregates should be 
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more active than poorly intermixed 
consortia. And in the alternative 
model, in which ANME reduce the 


CH 


sulfate and feed sulfur compounds to 4 


the deltaproteobacteria*®, one would 
expect archaeal activity to be unre- 
lated to bacterial activity, but that 
bacterial activity would still be higher 
the closer the bacterial cells are to the 
archaeal cells. This pattern was also not 
observed by McGlynn and colleagues. 

Wegener et al. tested how the in vitro 
activity of consortia microbes enriched 
from a deep-sea sediment responded 
to the addition of various organic and 
inorganic compounds (including zero- 
valent sulfur) that could be produced 
by the ANME and consumed by the 
bacteria. Only the addition of hydro- 
gen stimulated sulfate reduction in 
the cultures, suggesting that this is the 
only compound suitable for shuttling 
electrons to the deltaproteobacteria. 
Successful cultivation of the deltapro- 
teobacteria alone with hydrogen as 
the only energy source confirmed this 
finding. However, the authors report 
that the level of hydrogen produced 
by the consortia when sulfate reduc- 
tion is inhibited is too low to explain 
the growth of the deltaproteobacteria. 
Thus, although hydrogen may be used 
as a growth substrate for the deltapro- 
teobacteria, it is clearly not the driver 
of anaerobic methane oxidation by the 
consortia. 

But if interspecies metabolite 
exchange is not happening in these 
consortia, how might they jointly 
generate energy? Stimulated by previ- 
ous speculations’, McGlynn et al. and 
Wegener et al. hypothesized that direct 
interspecies electron transfer (DIET)"” 
occurs between ANME and their 
bacterial partner through electrical 
connections. This trick would enable 
the deltaproteobacteria to efficiently 
reduce sulfate through using the electrons gen- 
erated by the archaea from methane oxidation. 
McGlynn and colleagues present modelling 
data predicting that, in aggregates powered by 
DIET, there would be little correlation between 
the activity and the spatial distribution of cells 
of the two partner species, more in line with 
their experimental data. 

DIET has recently been recognized as an 
alternative to interspecies transfer of diffusible 
intermediates such as hydrogen and formate”, 
and methane-producing microorganisms 
have been shown to accept electrons by this 
mechanism". By analysing available genomic 
information for ANME and the deltaproteo- 
bacteria, the two research groups discovered 
genes encoding large multi-haem cytochromes 
(proteins that mediate electron transport) and 
type IV pili (cellular appendages), which are 
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Figure 1 | Interaction modes. Anaerobic methanotrophic archaea 
(ANME) and deltaproteobacteria are marine microorganisms that 
together mediate the anaerobic oxidation of methane coupled with 

the reduction of sulfate. Three strategies for this interaction have 

been postulated. a, One proposes” * that ANME oxidize methane 
(CH,) to carbon dioxide and transfer the electrons (e ) obtained to 
deltaproteobacteria through diffusible metabolites such as hydrogen, 
formate or methanethiol. The bacteria use the electrons to reduce 
sulfate (SO, ) to sulfide (HS). b, McGlynn et al.° and Wegener et al.° 
suggest that the same oxidation and reduction reactions take place, but 
that direct electron transfer occurs by means of electrical connections 
involving multi-haem cytochrome proteins and appendages called 
type IV pili. c, An alternative hypothesis’ is that ANME oxidize 
methane but also reduce sulfate to zero-valent sulfur (S°), which, 

after release, forms disulfide (HS, ) in the presence of environmental 
sulfide. The deltaproteobacteria convert the disulfide to sulfide and 
sulfate, but, in contrast to the interactions in a and b, this activity is not 
necessary for powering anaerobic methane oxidation. 


both hallmarks of organisms capable of extra- 
cellular electron transfer. Consistent with this 
finding, McGlynn and colleagues show that 
the extracellular space in the aggregates could 
be stained with a cytochrome-reactive com- 
pound, suggesting that conductive haem pro- 
teins are present between the ANME and the 
bacterial cells. Wegener et al. even managed 
to directly visualize nanowire-like structures, 
10 nanometres thick and up to 1 mm long, that 
connect the ANME and deltaproteobacterial 
cells. These structures were not seen on the 
surface of the deltaproteobacteria in cultures 
without ANME, lending further support to 
their role in connecting the two partners. 
The results presented in these two papers 
are a major advance in microbiologists’ strug- 
gle to decipher the enigmatic metabolism of 
anaerobic-methane-oxidizing consortia, and 
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Deltaproteobacteria 


So) 
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they provide the first experimental 
support for DIET in these assem- 
blages. One of the biggest remain- 
ing challenges is to prove this mode 
of interaction unequivocally. Can 
ANME grow on anodes (electron- 
accepting electrodes) without their 
bacterial partners, and if so, what 
mechanism do they use for electron 
transfer? Members of the consortia 
cannot yet be cultured in isolation or 
genetically manipulated, so it will not 
bea straightforward matter to further 
characterize the role of the multi-haem 
cytochromes and the type IV pili. A 
logical next experiment would be to 
identify the location of these mole- 
cules in the aggregates. Furthermore, 
it will be fascinating to investigate 
whether DIET enables partnerships 
of ANME with other microbes and 
provides access to electron acceptors 
other than sulfate. 

With these findings, anaerobic 
methane oxidation becomes another 
hot candidate for the increasing num- 
ber of processes recognized as being 
driven by electromicrobiology”. A 
pressing task now is to determine 
how widespread DIET is in the vari- 
ous known ANME and deltaproteo- 
bacterial partner lineages. Interaction 
strategies between microorganisms 
can be subject to rapid evolution, and 
it is thus conceivable that phyloge- 
netically identical or closely related 
partner species use different modes of 
interaction”. m 
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JANE GOULD/ALAMY 


Mangrove 


maintenance 


The stilt-rooted trees of mangrove forests 
host rich biological diversity, as well as 
supporting fisheries and protecting shores 
from storm damage and erosion. These 
tidal-zone trees can maintain an appropriate 
soil elevation for local sea levels and 
inundation rates by accreting sediment 
or organic material around their roots 
(pictured, mangroves in Indonesia). But on 
page 559 of this issue, Lovelock et al. (C. E. 
Lovelock et al. Nature 526, 559-563; 2015) 
show that for many forests, current rates of 
sea-level rise outpace this adaptive capacity. 
Assessing 27 sites across the Indo-Pacific, 
the authors find that sediment availability 
is a key survival factor for mangroves in 
the region. But river damming and land- 
use change are reducing sediment supply. 
The researchers’ modelling predicts that, 
at current rates of sea-level rise, many 
mangrove forests could be submerged by 
2070. Marian Turner 
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A glimpse of 


Earth’s fate 


Analysis of data from the Kepler space observatory and ground-based telescopes 
has led to the detection of one, and possibly several, minor planets that are ina 
state of disintegration in orbit around a white dwarf star. SEE LETTER P.546 


FRANCESCA FAEDI 


hat will Earth’s fate be when the 
Sun dies? Writing on page 546 of 
this issue, Vanderburg et al.' offer a 


dramatic answer in their discovery of one, or 
possibly several, dying minor planets, which 
have a rocky bulk and chemical composition 
similar to Earth's. These small bodies are in a 
tight orbit around a white dwarf, the remnant 
of aSun-like star that has reached the end of its 
life, and they are being shredded to pieces by 
the star’s strong gravity and radiation field. The 
authors’ observations of the system reveal mul- 
tiple transit signals — the periodic dimming of 
the stellar light caused by passing foreground 
objects — induced by one or several disin- 
tegrating bodies that have orbital periods of 
4.5 to 4.9 hours. 

The vast majority of exoplanets discovered 
up to now orbit main-sequence stars, which, 


like our Sun, are in the prime of their lives. 
By contrast, the minor planets discovered by 
Vanderburg et al. are orbiting a member of 
the stellar graveyard, but not for much longer. 
A star like the Sun reaches the end of its life 
when the nuclear fuel in the stellar core is 
exhausted. During this process the Sun will 
expand and become a red-giant star that will 
engulf the inner planets Mercury and Venus. 
Whether the Earth will be swallowed up by the 
bloated Sun is still a matter of debate; however, 
even if the Earth survives, its surface will be 
roasted. Following the red-giant phase, and 
before becoming a white dwarf, the Sun will 
lose a large fraction of its original mass. 

This overall process will destabilize the 
planetary orbits’ and might cause collisions 
between the planets, similar to those that 
occurred during the infancy of the Solar 
System. Some planets might in this way be 
shattered to pieces resembling asteroids. If 


such an asteroid wanders too close to the 
white dwarf it will be ripped apart by strong 
tidal (gravitational) forces, and a circumstel- 
lar dust disk will form** of similar chemical 
composition to that of the original planetary 
core. Such a disk can be then accreted onto the 
atmosphere of the white dwarf’. 

White dwarfs are small but extremely 
dense, and so they have strong gravitational 
fields. Consequently, elements heavier than 
helium (called metals by astronomers) that 
fall into a white dwarf’s pure hydrogen or 
helium atmosphere are expected to sink 
towards the star’s core within a matter of 
days. But astronomers have discovered met- 
als such as carbon, silicon, oxygen and iron 
in the atmospheres of one-third of all known 
white dwarfs®, and observations at infrared 
wavelengths have revealed that some of these 
stars have circumstellar dust disks’*. Thus 
the atmospheric pollution of white dwarfs by 
metals such as these, which plausibly origi- 
nated in circumstellar disks, provides strong 
evidence that a substantial fraction of white 
dwarfs have devoured broken-up planets or 
asteroids of chemical compositions similar to 
those of terrestrial bodies. After all, carbon, 
silicon, oxygen and iron make up roughly 
93% of Earth’s mass’. 

The NASA Kepler space observatory was 
launched in 2009 and since then it has been 
obtaining high-precision photometric meas- 
urements of the brightness of stars in the 
constellations Cygnus and Lyra'’. By moni- 
toring planetary transits, these observations 


22 OCTOBER 2015 | VOL 526 | NATURE | 515 


© 2015 Macmillan Publishers Limited. All rights reserved 


| RESEARCH | NEWS & VIEWS 


HII 
1 Earth radius 


Trailing tail 


Leading tail 


White dwarf 


Figure 1 | A minor planet transits a white dwarf. Vanderburg et al.’ analysed data from the white 
dwarf star WD 1145+017 taken by the Kepler space observatory and ground-based telescopes. The 

data revealed transit features (periodic attenuations of the stellar brightness) that are best explained by 
the passage in front of the star of one or several disintegrating minor planets. The authors performed 
simulations of this process in which a minor planet that orbits very close to the star loses mass in the form 
of dust particles that generate leading and trailing cometary tails (colours indicate dust density). The 
various phases of the transit (leading tail, minor planet core and trailing tail) induce attenuations of the 
stellar brightness of different magnitudes and durations. Adapted from Fig. S7 of ref. 1. 


have led to the detection of hundreds of rocky 
exoplanets, and revolutionized this research 
field. Following technical problems during 
the first part of the mission, a second mission” 
(dubbed K2) was planned that included targets 
such as white dwarfs. 

Vanderburg et al. analysed photometric 
observations of WD 1145+017, a white dwarf 
that was observed during K2, and discov- 
ered multiple transit features in the data. The 
authors then used an established statistical 
regression method”, and identified transit 
signals induced by bodies that have orbital 
periods of 4.5 to 4.9 hours. These transits 
immediately seemed peculiar: they were 
shallow, which means that they did not cause 
strong dimming of the star’s brightness, and 
they lasted an unusually long time — about 
40 to 80 minutes. Because white dwarfs 
are small (about the size of Earth), a solid 
body that passes in front of the stellar disk is 
expected to induce a short transit event lasting 
only a minute or so!*"*. 

To clarify the nature of the transits, Van- 
derburg and colleagues further observed 
the system using several ground-based tel- 
escopes. This additional photometric moni- 
toring revealed very deep (40% of the stellar 
light was blocked), short-duration (5-minute) 
asymmetric transits separated by the domi- 
nant 4.5-hour period identified in the K2 
data. The typical geometry ofa transit dictates 
that a small spherical disk passing in front of 
a larger spherical disk would yield a perfectly 
symmetrical transit light curve. These obser- 
vations, however, obfuscated interpretations 
even more, because the transit signals seemed 
to be not only different, but also out of phase 
with those seen in the Kepler data, and mor- 
phologically asymmetrical between the initial 
and final parts of the transit. Finally, Vander- 
burg et al. analysed spectra of the white dwarf 
and detected metals in its atmosphere. 


As the authors state, a possible explanation 
of the unusual transit events is one or several 
minor planets in orbit around WD 1145+017 
that are losing material into space as they break 
into pieces. The evaporated material is expelled 
in a wind, forming a cloud of molecules that 
condenses behind the disintegrating body in 
the shape of a cometary tail (Fig. 1). Evapo- 
rating planets have been observed transiting 
main-sequence stars!>'°, and those observa- 
tions showed asymmetrical transit profiles 
and variable transit depths like the ones in the 
current study. In all these cases, a dust cloud 
trailing the evaporating body can explain the 
transits’ variable depths, asymmetrical profiles 
and unusually long durations. 

It is extremely exciting that astronomers 


EVOLUTION 


have recorded the final throes of a planetary 
system, and further analysis of its properties 
is warranted. This research will have a trans- 
formative impact on the exoplanet field and 
will stimulate studies of the chemistry of plan- 
etary interiors. Future observations of evapo- 
rating planets and metal-polluted white dwarfs 
might even allow scientists to distinguish 
between material that originated in a planet's 
core as opposed to its mantle. Although Earth's 
final days are a long way into the future, this 
research has allowed us a glimpse of the prob- 
ably inescapable outcome. = 
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aun 


An avian explosion 


The genome sequences of 198 bird species provide an unprecedented 
combination of breadth and depth of data, and allow the most robust resolution 
so far of the early evolutionary relationships of modern birds. SEE LETTER P.569 


GAVIN H. THOMAS 


he fossil record offers a compelling 

narrative of avian evolution. There 

are few known fossils of modern birds 
from the Cretaceous period (around 145 mil- 
lion to 66 million years ago), but most major 
modern-bird lineages are well represented in 
fossils from the Palaeogene (around 66 mil- 
lion to 23 million years ago). It is suggested 
that, following the mass extinction at the end 
of the Cretaceous that famously wiped out 
the non-avian dinosaurs, birds went through 
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a (geologically) brief recovery followed by an 
explosive species radiation. But agreement 
between the fossil record and phylogenetic 
trees has been conspicuously absent. On page 
569 of this issue, Prum et al.' use genome 
sequences of 198 bird species spanning the 
entire radiation of modern birds to build a 
new phylogeny. The authors’ tree resolves 
the branching order at the origins of modern 
birds and strongly supports a rapid radiation of 
major bird lineages soon after the Cretaceous- 
Palaeogene mass extinction. 

The evolutionary relationships between 


modern birds have been notoriously diffi- 
cult to resolve**. The problem for evolution- 
ary ornithologists is that ancient divergences 
over short periods are exceedingly difficult to 
tease apart, and this has limited their ability to 
make robust inferences about early bird evo- 
lution. Prum and colleagues use a genomic 
sequencing technique called anchored hybrid 
enrichment* to sample highly conserved 
(slowly evolving) regions of the genome and 
faster-evolving flanking regions that together 
are particularly well suited to teasing apart 
rapid, but ancient, radiations. 

Prum and colleagues’ phylogeny differs 
dramatically from another analysis reported 
last year, by Jarvis et al.’, of an exceptionally 
large data set of more than 40 million base 
pairs of nucleotide sequence data from 
48 avian genomes. Not surprisingly, the 
conflict is focused on the earliest branching 
events that separate major non-passerine taxa 
(Fig. 1). For example, a clade including hum- 
mingbirds, swifts and nightjars is shown to be 
sister to the rest of the Neoaves — a clade that 
includes all living bird species except for Pal- 
aeognathae (such as ostriches and kiwis), Gal- 
liformes (landfowl) and Anseriformes (ducks 
and geese) — rather than sister to grebes and 
flamingos. And an entirely new clade, called 
Aequorlitornithes, is identified with strong 
support and consists of the majority of Neo- 
avian groups of waterbirds. 

Why does the topology of some parts of 
the two trees differ so fundamentally despite 
the use of exceptionally large genomic data 
sets in both studies? One possibility is that an 
explosion of speciation after the Cretaceous- 
Palaeogene extinction saw all major lineages of 
birds branch off near-simultaneously. Indeed, 
the early diversification of birds may have been 
so rapid that it resembles a network, or bush, 
rather than a beautifully bifurcating tree of 
life. Recent support for this idea comes from 
the finding’ that a process called incomplete 
lineage sorting (ILS) was rampant when the 
major lineages of birds diversified. The effect 
of ILS is that different parts of the genome 
yield different evolutionary relationships and 
produce a pattern akin to tangled roots, rather 
than a tree. ILS is usually identified only in 
recent species radiations, but large genomic 
data sets allow for detailed tests that delve 
deeper into the evolutionary past. Although 
the difference between the phylogenies could 
simply result from the two data sets having 
sampled different parts of the genome that 
happen to be incongruent as a result of ILS, 
this seems unlikely given the vast amount of 
genomic data sequenced in each study. 

Instead, Prum et al. explore an alternative 
and perhaps more likely cause for discrepan- 
cies between phylogenetic hypotheses — a 
phenomenon called long-branch attraction. 
Long-branch attraction occurs when distant 
evolutionary relatives are incorrectly inferred 
to be close relatives; this can arise if evolution 
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Figure 1 | A comparison of avian phylogenies. The phylogenetic relationships presented by Prum et al. 
and Jarvis et al.” have here been distilled down to major bird lineages. The comparison reveals several 
differences between the postulated evolutionary relationships of taxa. 


has proceeded at an exceptionally high rate, or 
when one lineage has no close relatives. Prum 
et al. deal with long-branch attraction by sam- 
pling both deeply (the number of nucleotides 
sequenced) and broadly (the number of spe- 
cies). This strategy is rooted in well-established 
systematic theory, which shows that sampling 
more species can break long branches. When 
the 198-species data set of Prum et al. is 
reduced to include only the 48 species in Jarvis 
and colleagues’ data, Prum and colleagues’ 
phylogeny breaks down because many rela- 
tionships change fundamentally. The breadth 
of sampling is critical. 

A well-resolved phylogeny is the basis of 
robust dating. Congruence between dates from 
molecular phylogenies and the fossil record is 
a rare thing, and for birds it is likely to prove 
controversial”*. Prum and colleagues’ conclu- 
sion of an explosive radiation after the Creta- 
ceous—Palaeogene mass extinction is markedly 
different from the conclusions of many previ- 
ous molecular studies, which typically suggest 
that most major avian orders and many fami- 
lies originated further back in the Cretaceous 
period (for examples, see refs 9-11). 

Although genomic-scale data can add pre- 
cision to dating estimates, accuracy relies on 
the quality of the fossil material and our ability 
to place it in the correct evolutionary context. 
Phylogenetic trees are calibrated using the 
fossil record with all its inherent imperfec- 
tions. But the fossil record of birds is patchy, 
incomplete and geographically biased. For 
most divergences no fossil evidence is avail- 
able, and for others the fossil record probably 


underestimates the true age of origination, 
because the fossils discovered for a particu- 
lar group are likely to be younger than the 
age of divergence. The age of the root of the 
avian tree is particularly contentious, with 
different calibrations placing the explosive 
radiation of Neoaves either before’ or after® the 
Cretaceous—Palaeogene mass extinction. In 
the absence of a perfect fossil record, the best 
we can do is experiment with different cali- 
bration dates and levels of uncertainty around 
those dates. The new genomic data sets await 
extensive experimentation of this type, and 
so, as compelling as the historical narrative 
may be, it is perhaps for now best treated with 
hopeful caution. = 
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Non-coding recurrent mutations in 
chronic lymphocytic leukaemia 
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18, Alfonso Valencia", 


Nuria Lopez-Bigas°, David Torrents?, Ivo Gut'’, Armando Lopez-Guillermo®, Carlos Lopez-Otin'§ & Elias Campo°8 


Chronic lymphocytic leukaemia (CLL) is a frequent disease in which the genetic alterations determining the 
clinicobiological behaviour are not fully understood. Here we describe a comprehensive evaluation of the genomic 
landscape of 452 CLL cases and 54 patients with monoclonal B-lymphocytosis, a precursor disorder. We extend the 
number of CLL driver alterations, including changes in ZNF292, ZMYM3, ARIDIA and PTPNI11. We also identify novel 
recurrent mutations in non-coding regions, including the 3’ region of NOTCHI1, which cause aberrant splicing events, 
increase NOTCH activity and result in a more aggressive disease. In addition, mutations in an enhancer located on 
chromosome 9p13 result in reduced expression of the B-cell-specific transcription factor PAX5. The accumulative 
number of driver alterations (0 to =4) discriminated between patients with differences in clinical behaviour. This 
study provides an integrated portrait of the CLL genomic landscape, identifies new recurrent driver mutations of the 
disease, and suggests clinical interventions that may improve the management of this neoplasia. 


CLLisa B-cell neoplasia that exhibits a very heterogeneous course, with 
some patients following an indolent disease course, clearly contrasting 
with others experiencing an aggressive disease'*. Patients have been 
classically categorized in two groups, depending on whether their 
tumour B cells express B-cell receptor (BCR) immunoglobulin with 
immunoglobulin heavy variable (IGHV) genes bearing somatic hyper- 
mutation (IGHV-mutated) or not (IGHV-unmutated)*. Further stud- 
ies have led to the identification of additional biological features with 
prognostic value for CLL patients” *. However, the molecular mechan- 
isms responsible for the initiation and heterogeneous evolution of CLL 
remain largely unknown. 

Whole-genome sequencing (WGS) and whole-exome sequencing 
(WES) studies in CLL patients have identified recurrently mutated 
genes such as NOTCH1, SF3B1, TP53, BIRC3 and POT1, and delineated 
clonal evolution events in this neoplasia”"'*. Moreover, recent works 
have profiled the transcriptome and the DNA methylome of many CLL 
cases'*'*, Nevertheless, these studies have unveiled a high level of 
molecular heterogeneity, thus creating the need for integrated analysis 
of different genomic parameters in a larger number of patients. In this 


work, and as part of the International Cancer Genome Consortium 
(ICGC) project’, we have performed a comprehensive analysis of the 
genetic alterations driving the oncogenic transformation in 506 patients 
with monoclonal B-lymphocytosis (MBL) or CLL. We have also carried 
out additional genomic studies involving single nucleotide polymorph- 
ism (SNP) arrays, DNA methylation arrays, RNA sequencing (RNA-seq) 
analyses and gene expression arrays. Finally, we have performed clinical 
studies aimed at translating the observed molecular alterations into clin- 
ical applications for CLL patients. 


Mutational signatures in CLL subtypes 


We studied pre-treatment tumour and matched non-tumour 
samples from 506 patients (452 CLL and 54 MBL): 317 (62%) were 
IGHV-mutated (IGHV-MUT), 179 (35%) IGHV-unmutated (IGHV- 
UNMUT), and 10 (2%) undetermined (Extended Data Table 1 and 
Supplementary Table 1). We performed WGS of 150 tumour/normal 
pairs, and WES of 440 cases (including 84 with both WGS and 
WES data). Somatic mutations analysed using the Sidron pipeline” 
revealed the presence of 359,456 substitutions and small indels in 
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WGS analyses (240-5,416 per tumour), and an average mutation 
burden of 0.87 mutations per megabase (Mb) (Extended Data Fig. 1 
and Supplementary Table 2). CLL and MBL samples had a similar 
mutation burden (0.87 versus 0.89 mutations Mb 1, respectively, 
P= 0.8), and were considered together for WGS analysis. The num- 
ber of somatic substitutions (excluding IG loci) was higher in IGHV- 
MUT tumours than in IGHV-UNMUT cases (2,847 versus 1,975, 
P<3xX 10 °) (Extended Data Fig. 1). Three main mutational signa- 
tures were identified (Extended Data Fig. 1): an age-related signature 
involving C-to-T transitions at CpG sites; signature 2, characterized 
by T:A > G:C transversions; and an activation-induced cytidine dea- 
minase (AID) signature’. This latter pattern was only detected on 
IG loci, although we also confirmed AID-induced mutations in 
some off-target genes highly expressed in the germinal centre*!”. 
Signature 2 was almost exclusively present in IGHV-MUT tumours, 
and its presence clearly separated IGHV-MUT from IGHV-UNMUT 


tumours (Extended Data Fig. 1). 


Landscape of somatic mutations 

We combined somatic mutations from the 506 tumour/normal pairs 
detected by either WGS or WES (excluding IG genes), resulting in a 
total of 13,631 somatic mutations affecting protein-coding genes 
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(average 26.9 per tumour) and 951 copy number alterations 
(CNAs) (average 1.9) (Fig. 1 and Supplementary Table 3). We iden- 
tified 36 genes (tier 1) as recurrently mutated in CLL (false discovery 
rate (FDR) < 10%), and 23 additional genes (tier 2) were significantly 
mutated in one subgroup (IGHV-MUT or IGHV-UNMUT), had 
recurrent or truncating mutations, or had driver mutations described 
in other malignancies (Extended Data Table 2). Two genes (BTG2 and 
DTX1) were excluded as they are known targets of the SHM 
machinery”. The remaining genes included most of the drivers prev- 
iously described by different WES studies”’*. The most frequently 
mutated gene in CLL was NOTCH1 (57 cases, 12.6%), followed by 
ATM (11%), SF3B1 (8.6%), BIRC3 (8.8%), CHD2 (6%), TP53 (5.3%) 
and MYD88 (4%). Furthermore, we identified 12 novel genes recur- 
rently mutated in CLL and not previously linked to this disease, 
including ZNF292, ARIDIA, ZMYM3 and PTPN11. Most CLL driver 
genes were preferentially mutated in IGHV-UNMUT tumours and 
had subclonal mutations'' (Supplementary Fig. 1). Notably, a similar 
frequency of mutated drivers was found in CLL and MBL cases of 
similar IGHV gene SHM status (Extended Data Table 2). 
We also identified some genes (tier 3) that probably contain driver 
mutations but were found in three or less CLL patients. This is the case 
of activating mutations in the oncogenes KRAS and NRAS, truncating 
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mutations in the tumour suppressors CDKN1B and CDKN2A, and 
recurrent mutations in the transcription factor IKZF3. Mutations in 
components of the BCR and Toll-like receptor pathway were exclu- 
sively present in IGHV-MUT tumours. They included those in 
MYD88, CD79A, CD79B, TLR2 and IRAK1, detected in 22 of the 
278 IGHV-MUT cases, but in none of the 166 IGHV-UNMUT CLL 
patients (P = 4.1 X 10°), confirming the importance of the BCR and 
Toll-like receptor pathways both in CLL pathobiology and as thera- 
peutic targets’. Collectively, eight main pathways are frequently 
altered in CLL, including BCR signalling, cell cycle regulation, apop- 
tosis, DNA damage response, chromatin remodelling, NF-«B signal- 
ling, NOTCH1 signalling, and RNA metabolism (Fig. 1). 


DNA structural alterations 


Analysis of structural variants confirmed the presence of known CNAs 
such as loss of 13q14, 11q22-q23, 17p, 6q15-q21 and trisomy 12 
(Extended Data Fig. 2 and Supplementary Table 4). In addition, we 
identified novel candidate CLL driver genes in regions of recurrent 
chromosomal alterations (Fig. 1). They included deletions involving 
ZNF292 at 6q15 (2.4%), deletions of 2q37 encompassing SP140 and 
SP110, loss of 3p21 (2%) affecting SMARCC1 and SETD2, and loss of 
10q24 (1.8%) involving NFKB2 (Supplementary Fig. 2). 

Unlike other B-cell malignancies, translocations involving IG 
genes were uncommon in CLL with the exception of BCL2 rearrange- 
ments (10 cases). They occurred exclusively in IGHV-MUT cases, 
and resulted in overexpression of BCL2 and recruitment of the 
SHM machinery (Extended Data Fig. 3). Analysis of WGS data using 
SMUFIN™ also revealed the presence of 147 interchromosomal trans- 
locations in 43 out of 148 cases (Supplementary Table 5). Recurrent 
translocations involving chromosome 13q14 with different chromo- 
somal partners and associated with deletion or disruption of the 
microRNA cluster miR-15a/miR-16 were identified in nine cases 
(P< 10 8). We also detected 15 non-recurrent chromosomal trans- 
locations, one of them involving the IG locus (IGH-CBFA2T3), and 
14 predicted to originate in chimaeric genes, five of which could be 
confirmed by RNA-seq (Supplementary Table 5). 

Complex rearrangements (chromothripsis/chromoplexy)**”* were 
identified in 15 out of 452 CLL cases (Extended Data Fig. 3), being 
more frequent in IGHV-UNMUT than in IGHV-MUT tumours 
(6% versus 1.8%, P< 0.05). Although these complex alterations did 
not result in any recurrent rearrangement, we observed involvement 
of chromosome 13 in 4 out of 15 tumours, resulting in mir-15a/mir-16 
loss. Similar to previous studies’’, mutations in TP53 were more fre- 
quent in tumours with chromothripsis (26% versus 4.6%, P < 0.006). 
Furthermore, SETD2 inactivation was more frequent in CLL cases 
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with chromothripsis than in non-chromothriptic cases (26% versus 
1.4%, P<2X10 *). 

This analysis revealed significant relationships between several 
alterations, including co-occurrence of NOTCH1 mutations and 
chromosome 12 trisomy”’, trisomy 12 with trisomy 18 (q < 0.01), 
and the mutually exclusive pattern of 13q14 deletion and trisomy 
12 (q< 0.01). We also observed a higher co-occurrence of mutations 
in NOTCH1 with those in MGA (q<0.01), BCOR (q<0.01) and 
BIRC3 (q < 0.05), or gain of 2p16 with loss of 18p (q < 0.01), among 
others (Supplementary Fig. 3). 


Mutations in non-coding regions 


The presence of functional mutations outside of protein-coding 
regions remains an open question in cancer research”. We observed 
in one CLL case a previously described mutation in the TERT pro- 
moter (C228T)”’. Eight mutations in mir-142 were identified in five 
cases (Supplementary Fig. 4), with seven of them within AID target 
consensus (WRCY or WA), reinforcing it as a target of the SHM”. 
We also identified 88 mutations in non-coding regions present in at 
least two WGS cases (Supplementary Table 6). Most of them were 
located either within hypermutated late-replication regions* 1 or 
within the 5’-region of BACH2, BCL6, BTG2, CXCR4 and TCL1A, 
genes known to undergo SHM during the germinal centre reac- 
tion*’””. Most mutations were within the AID target sequence 
(WRCY), probably reflecting the passage of the respective progenitor 
cells through the germinal centre. 

Notably, the most frequent recurrent non-coding mutation was 
detected in the 3’ UTR of NOTCH1 (chr9: 139390152T > C), present 
in 4 of the 150 cases with WGS data (Fig. 2a). Sequencing of this 
region in 356 cases with only WES data revealed seven additional 
tumours with the same mutation, and two cases with a mutation seven 
or nine bases downstream of the original one. RNA-seq from six of 
these 3’ UTR NOTCH 1-mutated tumours confirmed the presence of a 
novel splicing event within the last exon of NOTCH1 (Fig. 2a), which 
was absent in 290 tumours without these mutations (Extended Data 
Fig. 4). This splicing event occurred preferentially between a cryptic 
donor site located in the coding region of the last exon of NOTCH1 
anda newly created acceptor site in the 3’ UTR, resulting in a deletion 
that includes the last 158 coding bases. Nevertheless, some splicing 
events occurred between the canonical donor site on exon 33 and the 
newly created acceptor site in the 3’ UTR of exon 34 (Fig. 2a). Reverse 
transcription PCR (RT-PCR) analysis confirmed the presence of this 
aberrant splicing only in cases with mutations in the 3’ UTR (Fig. 2b). 
This within-exon splicing is predicted to remove a PEST domain of 
NOTCH1 and to increase protein stability, as in the previously 
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described p.P2514Rfs*4 NOTCH1 mutation'®. Western blot analysis 
confirmed the presence of a smaller molecular mass band in 3'-UTR- 
and p.P2514Rfs*4-mutated cells, which was absent in cells without 
mutations in NOTCHI (Fig. 2c). Immunohistochemical analysis 
showed a strong NOTCH nuclear signal in tumour cells from 
patients with 3’ UTR or p.P2514Rfs*4 mutations (Extended Data 
Fig. 4). All cases with mutations in the 3’ UTR of NOTCH] belonged 
to the IGHV-UNMUT subgroup, accounting for up to 6.7% (12 out of 
179) of all IGHV-UNMUT cases. Patients with 3’ UTR NOTCHI1 
mutations had features of adverse prognosis (Extended Data Fig. 4) 
and behaved similarly to patients with coding mutations in NOTCH1 
in terms of the time to first treatment (TTT) and overall survival 
(Fig. 2d, e). 

We further explored the presence of genome regions with high 
mutational density and found 24 loci enriched in somatic mutations 
(Fig. 3a). Most of them correspond either to recurrently mutated 
genes in CLL or to known targets of the SHM process. However, we 
identified a densely mutated cluster in a small intergenic region of 
chromosome 9p13, in which 17 different tumours had somatic 
mutations (Fig. 3b). This region is enriched for both lymphocyte- 
specific transcription factor binding sites and histone marks 
related to enhancer elements only in a lymphoblastoid B-cell line 
(Supplementary Fig. 5). DNase-seq and chromatin immunoprecipi- 
tation sequencing (ChIP-seq) analysis in normal B cells and CLL cases 
revealed that the region contains an active enhancer characterized by 
a DNase I hypersensitive site and nucleosomes containing histone 3 
Lys4 methylation (H3K4mel) and H3K27 acetylation (H3K27ac) 
(Fig. 3b and Supplementary Fig. 5). Chromosome conformation cap- 
ture sequencing (4C-seq) analysis* in tumour cells from two CLL 
patients revealed that this potential enhancer shows high three- 
dimensional contact frequencies extending towards the telomere up 
to the PAX5 locus, located 330 kilobases (kb) away (Fig. 3c and 
Supplementary Fig. 5). Expression analysis of 15 genes located within 
1 Mb of this element revealed that the only gene showing a significant 
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expression difference correlated with the presence of mutations 
within the putative enhancer region was indeed PAX5 (average 
expression 87 versus 131, P=1.9 X 10 *) (Extended Data Fig. 5). 
PAXS5 encodes a transcription factor that has an essential role in 
B-cell differentiation®® and, based on the evidence provided above, 
is the most likely target of the identified enhancer region. CRISPR/ 
Cas9-based genome editing of this region allowed us to demonstrate 
that either the introduction of a specific point mutation, or the dele- 
tion of this putative enhancer in a lymphoblastoid B-cell line or in 
RAMOS cells, resulted in a 40% reduction in the expression of PAX5 
(Extended Data Fig. 6). 

Sequencing of this region in all CLL cases with WES data identified 
25 new cases with somatic mutations. We also found somatic muta- 
tions in this enhancer in diffuse large B-cell lymphomas (29%, 26 
out of 89), follicular lymphomas (23%, 20 out of 86) and mantle-cell 
lymphomas (5%, 3 out of 66) (Supplementary Table 7). Interestingly, 
84% of CLL cases with mutations in this enhancer belong to the 
IGHV-MUT subgroup, accounting for up to 13% of IGHV-MUT 
CLL cases. Mutations in the PAX5 enhancer were the only recurrent 
alteration observed in 7 cases, while in 11 tumours this alteration was 
only combined with 13q14 deletion, raising the possibility that PAX5 
enhancer mutations might constitute driver events contributing to the 
development of these tumours. 


Integrative analysis 


We then integrated the standard genetic classification of CLL with 
a recent patient categorization in three subgroups based on a DNA 
methylation signature of naive and memory B cells'”** (Supple- 
mentary Table 1). The three epigenetic subgroups showed a distinct 
distribution of genetic changes, IGHV gene repertoire and stereo- 
typed B-cell receptors (Extended Data Fig. 7). The intermediate group 
had moderate IGHV mutation levels, an intermediate contribution of 
signature 2 mutations, higher frequencies of SF3B1 and MYD88 
mutations, biased usage of the IGHV-3-21 and IGHV-1-18 genes 


Figure 3 | Identification of somatic mutations 
in a PAX5 enhancer. a, Regions with a high 
density of somatic mutations in 150 WGS analyses. 
Regions correspond to recurrently mutated genes 
(green), targets of SHM (red/orange), and other 
regions (blue). b, Detailed view of a 9p13 region 
showing the accumulation of somatic mutations 
(arrowheads) in CLL tumours as well as DNase I 
hypersensitivity and histone H3K27ac, H3K4mel 
and H3K4me3 enrichment from CLL tumour 110. 
c, 4C-seq analysis in CLL cells showing the 
interaction frequencies of the enhancer with the 
surrounding regions. p%, percentile. 
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and increased frequency of stereotyped subset #2. These results 
support the hypothesis that this group has a distinct genetic and 
epigenetic makeup'”***°. We also found a highly significant correla- 
tion (r = 0.64, P< 0.001) between the number of WGS mutations per 
case and the number of CpGs showing differential methylation as 
compared to naive B cells (Extended Data Fig. 7). Similarly, the pro- 
portion of signature 2 mutations was also correlated with differential 
methylation in IGHV-MUT cases. 

MBL cases were indistinguishable at the genomic, transcriptomic 
and epigenomic level from CLL cases assigned to the same IGHV 
subgroup (Extended Data Fig. 7 and Extended Data Table 2), in 
accordance with the overlapping biological features of both processes. 
Notably, the burden of driver alterations was significantly lower in 
patients with MBL than with CLL (1.2 versus 1.7, for IGHV-MUT 
cases, P=8 X 10“), consistent with a model in which MBL/CLL 
evolution is accomplished by the progressive accumulation of driver 
alterations. 


Clinical implications 

Our data support the hypothesis that the observed genomic differ- 
ences between the two major molecular subgroups of CLL might be in 
part responsible for their different outcome. The average number of 
driver mutations in IGHV-UNMUT tumours was higher than in 
IGHV-MUT cases (3.5 versus 1.7, P< 10 1”), despite the 44% higher 
mutational burden of IGHV-MUT tumours. We found that 88% of 
cases had at least one driver mutation, with almost all IGHV- 
UNMUT tumours containing at least one driver alteration, while a 
smaller fraction was found in the IGHV-MUT subgroup (96% versus 
83%, P<5X10°). 

We evaluated the influence of the presence of each alteration on the 
TTT and overall survival from the time of sampling. The mutation of 
several drivers and CNAs was significantly correlated with an adverse 
prognosis, in some cases independently from Binet stage and IGHV 
mutational status (Fig. 4a, Extended Data Fig. 8 and Supplementary 
Table 8). We confirmed the independent prognostic value of known 
gene mutations (SF3B1 and TP53), and identified novel independent 
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Figure 4 | Prognostic effects of individual alterations and number of 
drivers. a, Effect on overall survival (left) and time-to-treatment (right) for 
each genomic alteration. Labels including genes and chromosomal regions 
represent combined analysis of mutations and copy number alterations. 
Hazard ratios and 95% confidence intervals are shown. Alterations conferring 
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prognostic drivers for both shorter TTT (BRAF, ZMYM3, IRF4, 
NFKB2, 20p deletion, and 2p16 and 5q34 gains), and overall survival 
(ASXL1, POT1 and 14q24 deletion). Remarkably, the accumulative 
number of drivers (0 to =4) per tumour had a progressively worse 
effect on outcome that could discriminate patient subsets differing by 
more than 10 years in the median TTT, independently of IGHV 
status and Binet stage. They also showed prognostic value for overall 
survival, although not independent in the multivariate analysis 
(Fig. 4b, c). Finally, we examined the potential druggability of the 
alterations in genes and pathways identified in CLL patients”, finding 
candidate drugs for 19 of the 59 driver genes in 42% of the CLL 
cases (190 out of 452) (Supplementary Fig. 6 and Supplementary 
Tables 9 and 10). 


Discussion 


In this work, we have provided a comprehensive and integrated 
molecular characterization of CLL. We have also unveiled new bio- 
logical aspects of this disease and identified novel driver genes pre- 
sumably implicated in its pathogenesis. The large number of different 
genomic alterations found in our cohort illustrates the enormous 
biological heterogeneity of CLL. Notably, the use of WGS has allowed 
us to identify recurrent mutations in non-coding regions, including 
the 3’ UTR of NOTCH1 and a PAXS5 enhancer, resulting in marked 
alterations in the activity of these transcription factors of well-known 
importance in leukaemia and other malignancies**”’. Previous studies 
have shown the effect of NOTCH1 mutations in CLL prognosis'*””. 
However, these studies may seriously underestimate the true 
incidence of NOTCH1 deregulation in CLL, given our finding that 
about 20% of NOTCH 1-mutated tumours contain mutations in the 3’ 
non-coding region. These findings emphasize the value of large 
genome-wide studies to discover new molecular alterations that 
may have a profound effect on cancer development and progression. 

The evaluation of putative associations between these molecular 
alterations and the clinicopathological features of our cohort of CLL 
patients has been challenging owing to the low frequency of many 
significantly mutated genes. Patients in which no recurrent alterations 
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were found had the best prognosis and near normal overall survival, 
suggesting that this study has uncovered most driver alterations 
involved in CLL evolution, opening new avenues to explore the clin- 
ical impact of the heterogeneous molecular composition of the disease 
in independent cohorts. Hopefully, this work will finally result in new 
opportunities for improving the clinical management and persona- 
lized treatment of CLL patients. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Patients. The clinical and biological characteristics of the 506 patients are shown 
in Extended Data Table 1. Among these patients, 452 were diagnosed with CLL 
and 54 with MBL. Cases were defined as IGHV-MUT when the identity of 
immunoglobulin genes was less than 98%. The tumour samples were obtained 
before administration of any treatment. All patients gave informed consent for 
their participation in the study following the International Cancer Genome 
Consortium (ICGC) guidelines and the ICGC Ethics and Policy committee’. 
Collection and preparation of samples. Tumour samples were obtained from 
fresh or cryopreserved mononuclear cells. To purify the CLL or MBL fraction, 
samples were incubated with a cocktail of magnetically labelled antibodies direc- 
ted against T cells, natural killer cells, monocytes and granulocytes (CD2, CD3, 
CD11b, CD14, CD15 and CD56), adjusted to the percentage of each contaminat- 
ing population (AutoMACS, Miltenyi Biotec). The degree of contamination by 
non-CLL cells in the CLL fraction was assessed by immunophenotype and flow 
cytometry. DNA was extracted from purified samples by using a Qiagen kit, and 
the quality of purified DNA was assessed by SYBR-green staining on agarose gels 
and quantified using a Nanodrop ND-100 spectrophotometer. The tumour DNA 
and RNA samples for further genomic analysis contained =95% neoplastic cells 
and the contamination by neoplastic cells in normal DNA was <2%. 

WGS, WES and RNA-seq. For WGS, 2 pg of genomic DNA from each sample 
was used for the construction of two short-insert paired-end sequencing libraries. 
One library was prepared using a standard TruSeqDNA Sample Preparation Kit 
v2 (Illumina Inc.) with some modifications. In short, following the fragmentation 
(CovarisE220) the libraries were size-selected on the agarose gel and processed 
through end-repair, adenylation and indexed adaptor ligation. The gel eluate 
was directly amplified by 10 PCR cycles. The second library was prepared fol- 
lowing the same protocol as above, however, it included a heating step to 72 °C 
before adaptor ligation and was suddenly cooled down to 4°C. This resulted 
in a biased proportion of high GC content reads and counterbalanced some 
of Illumina’s PCR sample preparation methods’ GC-bias, thus improving cov- 
erage of increased GC-content regions of the genome. Both types of libraries were 
sequenced in paired-end mode on Illumina GAIIx (2 X 151 bp) using Sequencing 
kit v4 or Illumina HiSeq2000 (2x101 bp) using TruSeq SBS Kit v3 (Illumina Inc.). 

For other samples (Supplementary Table 1), the library preparation procedure 
was modified to remove the PCR step during short-insert paired-end library 
preparation. The TruSeq DNA Sample Preparation Kit v2 (Illumina Inc.) and 
the KAPA Library Preparation kit (Kapa Biosystems) were used. In brief, 2 1g of 
genomic DNA was sheared on a Covaris E220, size-selected and concentrated 
using AMPure XP beads (Agencourt, Beckman Coulter) to reach the fragment 
size of 220-480 bp. Fragmented DNA was end-repaired, adenylated and ligated to 
Illumina specific indexed paired-end adaptors. All libraries were quantified by 
Library Quantification Kit (Kapa Biosystems). Each library was sequenced using 
TruSeq SBS Kit v3-HS (Illumina Inc.), in paired-end mode, 2 X 101-bp, in three 
sequencing lanes of HiSeq2000 flowcell v3 (Illumina Inc.) according to standard 
Illumina operation procedures with minimal yield of 85 Gb for each sample. 
Primary data analysis was carried out with the standard Illumina software Real 
Time Analysis (RTA 1.13.48) and followed by generation of FASTQ files. 

For WES, 3 ig of genomic DNA from each sample were sheared and used for 
the construction ofa paired-end sequencing library as described in the paired-end 
sequencing sample preparation protocol provided by Illumina‘. Enrichment of 
exonic sequences was then performed for each library using either the Sure Select 
Human All Exon 50 Mb or All Exon+UTRs v4 kits (Supplementary Table 1) 
following the manufacturer’s instructions (Agilent Technologies). Exon-enriched 
DNA was pulled down by magnetic beads coated with streptavidin (Invitrogen), 
followed by washing, elution and 18 additional cycles of amplification of the 
captured library. Enriched libraries were sequenced (2 X 76 bp) in one lane of 
an Illumina GAIIx sequencer or in two lanes of a HiSeq2000 when using pools of 
eight samples. 

RNA was assayed for quantity and quality using Qubit RNA HS Assay (Life 
Technologies) and RNA 6000 Nano Assay on a Bioanalyzer 2100. RNA-seq 
libraries were prepared from total RNA using the TruSeq RNA Sample Prep 
Kit v2 (Illumina Inc.) with minor modifications. In brief, 0.5 1g of total RNA 
was used as the input material for poly-A-based messenger RNA enrichment with 
oligo-dT magnetic beads. Selected mRNA was fragmented (resulting RNA frag- 
ment size was 80-250 nucleotides, with the major peak at 130 nucleotides). After 
first and second strand cDNA synthesis the double-stranded complementary 
DNA was end-repaired, 3’ adenylated and the 3’ “‘T’ nucleotide of the adaptor 
was used for the Illumina indexed adapters ligation. The ligation product was 
enriched by 10 cycles of PCR. Each library was sequenced using TruSeq SBS Kit 
v3-HS, in paired-end mode with a read length of 2 X 76 bp. We generated more 
than 20 million paired-end reads for each sample in a fraction of a sequencing 
lane on HiSeq2000 (Illumina Inc.) following the manufacturer’s protocol. Image 
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analysis, base calling and quality scoring of the run were processed using the 
manufacturer’s software Real Time Analysis (RTA 1.13.48) and followed by 
generation of FASTQ sequence files. 
Read mapping and processing. For WGS and WES, reads from each library were 
mapped to the human reference genome (GRCh37) using BWA” with the same 
option, and a BAM file was generated using SAMtools*’. Reads from the same 
paired-end libraries were merged, and optical or PCR duplicates were flagged 
using Picard (http://picard.sourceforge.net/index.shtml). For the identification of 
somatic substitutions and indels, we used the Sidron algorithm’. This algorithm 
was adapted to identify subclonal mutations in which the mutant allele fraction 
is low, but supported by at least three reads. Visual inspection of recurrent 
mutational hotspots allowed the inclusion of some somatic mutations that were 
originally discarded owing to the presence of an excess of mutant reads in the 
non-tumour sample, or owing to low coverage, especially in the case of NOTCH1, 
in which a high GC content on exon 34 usually resulted in very low coverage by 
WES. In samples in which NOTCHI coverage was too low to make a call, muta- 
tions were analysed by Sanger sequencing. A comparison of mutation calls by 
Sidron and by Sanger sequencing of some of the most frequently mutated genes in 
CLL (SF3B1, TP53, MYD88) revealed more than 97% specificity and at least 90% 
sensitivity. Mutational signatures were extracted using the WTSI Mutational 
Signature Framework’. To estimate the presence of subclonal mutations in 
recurrently mutated genes, the fraction of reads supporting a mutant allele was 
calculated for those mutations in which the depth of coverage was at least 20 
reads. Flow cytometry analysis confirmed that the percentage of tumour cells was 
at least 98%. A case was considered as having a clonal mutation when at least 80% 
of cells were estimated to contain the mutation, and the mutant allelic fraction 
was within the 95% confidence interval. 
Analysis of CNAs and structural variants. For the identification of CNAs, 
tumour and normal DNA from 505 CLL patients were analysed using 
Affymetrix SNP6.0 microarrays (Affymetrix) as previously described*®. SNP array 
experiments were carried out at CeGen (http://www.cegen.org). Additionally, for 
230 cases array-comparative genomic hybridization was performed in SurePrint 
G3 Human aCGH Microarray 1M (Agilent Technologies). Array-comparative 
genomic hybridizations were performed at qGenomics (http://www.qgenomics. 
com). Nexus 6.0 Discovery Edition software (Biodiscovery) was used for global 
analysis and visualization. Copy number neutral loss of heterozygosity was con- 
sidered when the size of alteration was larger than 5 Mb. Acquired copy number 
neutral loss of heterozygosity was observed in 28 regions, 16 of them affecting 
known driver genes that already contained mutations, resulting in homozygous 
deletion of mir-15a/mir-16 at 13q14, or inactivation of ATM and TP53 
(Supplementary Table 4). According to the literature, the presence of chromo- 
thripsis was considered when at least seven switches between two or more copy 
number states were detected on an individual chromosome in which LOH was 
retained, and chromoplexy was defined when at least three chained chromosomal 
rearrangements were detected in a tumour’”””. In one case in which genotyping 
data were not available, we used exome2cnv* to identify CNAs from WES data. 
For the identification of breakpoints in WGS derived from structural variants, 
we used SMUFIN™, a program that directly compares sequence reads from 
normal and tumour samples, to identify chromosomal breakpoints correspond- 
ing to large structural variants at base-pair resolution. We analysed 150 tumour/ 
normal whole-genome pairs setting the cross-sample contamination filter to 5%. 
Two WGS tumours (019 and 029) showed an abnormal number of breakpoints 
owing to the presence of sequence lanes with high error rates that interfere with 
SMUFIN and were not considered for this analysis. All predicted breakpoints that 
were not confirmed through the BAM file after manual inspection were systematic- 
ally discarded. A total of 48 out of 53 (91%) selected predicted breakpoints could be 
verified using PCR amplification followed by Sanger sequencing (Supplementary 
Table 5). This verification rate is similar to the one observed in our initial description 
of the method. In addition, custom scripts were used to identify potential transloca- 
tions involving immunoglobulin genes either in WGS or WES. This resulted in the 
identification of ten cases (5 WGS and 5 WES) containing putative translocations 
with the BCL2 locus (nine with the t(14;18)(q32;q21), and one with the 
t(2;18)(p11;q21) translocation), all of which were confirmed by either Fluorescence 
in situ hybridization (FISH), cytogenetics or PCR (Extended Data Fig. 3). 
G-banding and FISH analysis. Conventional cytogenetics was performed on 
Giemsa-banded chromosomes (G-banding) obtained after a 72-h culture and 
stimulation with tetradecanoyl-phorbol-acetate. At least 20 G-banded meta- 
phases per sample were analysed. Results were described according to the 
International System for Human Cytogenetic Nomenclature. FISH analyses 
on fixed cells were performed using probes that interrogated for 11q23/ATM, 
13q14.3 and 17p13/TP53 deletions and trisomy 12 (Abbott Molecular). Two 
hundred nuclei were examined for each probe. LSI IGH/BCL2 dual colour 
fusion for the t(14;18)(q32;q21) (Abbot Molecular) was used to confirm BCL2 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


rearrangements detected by WGS and WES. Additionally, in case 853, whole 
chromosomal paintings of chromosomes 8, 11 and X were performed to deter- 
mine the complex karyotype (with four derivative chromosomes), and rearrange- 
ments predicted by SMUFIN algorithm. 

Analysis of DNA methylation. DNA methylation was analysed using the 450k 
Human Methylation Array (Illumina). We used the EZ DNA Methylation Kit 
(Zymo Research) for bisulphite conversion of 500 ng of genomic DNA, and the 
Infinium methylation assay was carried out as described by the manufacturer”. 
These array experiments were performed at CeGen (http://www.cegen.org). Data 
from the 450k Human Methylation Array were analysed in R using the minfi 
package (version: 1.6.0)°’, available through the Bioconductor open source soft- 
ware, applying several custom filters. Unsupervised analyses were performed by 
principal component analysis and differential methylation between individual 
CLL/MBL samples and controls was detected using an absolute difference of 0.25. 
Gene expression profiling. We studied the gene expression profiling of 468 
cases using highly purified leukaemic CLL cells. Total RNA was extracted with 
the TRIzol reagent following the recommendations of the manufacturer 
(Invitrogen Life Technologies). RNA integrity was examined with the Agilent 
2100 Bioanalyzer (Agilent Technologies) and only high-quality RNA samples 
were hybridized to Affymetrix Human Genome Array U219 array plates accord- 
ing to Affymetrix standard protocols. Summarized expression values were 
computed using the robust multichip average approach implemented in the 
Expression Console Software (Affymetrix Inc.). 

RT-PCR. cDNA was synthesized from 500 ng of total RNA using High Capacity 
RNA-to-cDNA kit (Life Technologies) following the manufacturer’s instructions. 
Amplification was performed using 50 ng of DNA using Qiagen Multiplex PCR 
Kit (Qiagen), and the reaction mix contained 1X Qiagen Multiplex PCR Master 
Mix (12.5 il), primer mix (0.4 4M of each primer) and RNase-free water for a 
total reaction volume of 25 pl. For NOTCH] within-intron splicing, primers used 
were: forward 5’-CCTAACAGGCAGGTGATGCT-3’ and reverse 5’-TACTC 
CTCGCCTGTGGACAA-3’. PCR amplification was performed for NOTCH1 
3’ UTR forward primer 5’-CCTAACAGGCAGGTGATGCT-3’ and reverse 
primer 5'-ATCTGGCCCCAGGTAGAAAC-3’, PAX5 enhancer first region 
forward 5'-TAGATTGTGCCGAATGCTGA-3’ and primer 5'-ACAAGCTCT 
CCTCCCAGGAA-3', and PAX5 enhancer second region forward primer 
5'-AGGATGAGAACGGGCAAAC-3’ and reverse primer 5’-GGAGCTTCCA 
GCTGAACTGA-3’. All PCR products were run on a capillary electrophoresis 
gel (QIAxcel Advanced System, Qiagen) with the QIAxcel DNA screening 
kit (Qiagen). 

Western blot analysis. For western blot analysis, tumour cells were lysed for 
30 min in Triton buffer (1% Triton X-100, 50 mM Tris-HCl, pH 7.6, 150 mM 
NaCl, 1mM EDTA) supplemented with protease and phosphatase inhibitors 
(1 mM PMSF, 2 mM sodium pyrophosphate, 2 mM sodium f-glycerophosphate, 
1 mM Naf, 1 mM sodium orthovanadate, 10 1g ml’ leupeptin and 10 pg ml 
aprotinin). Lysates were cleared by centrifugation at 15,000g at 4°C for 15 min, 
and protein concentrations determined using the Bradford method. Thirty 
micrograms of protein was separated by SDS-PAGE and transferred onto 
Immobilon-P membranes. Membranes were blocked with 2.5% phospho- 
Blocker (Cell Biolabs) in TBS-Tween 20. For protein immunodetection, the spe- 
cific primary antibodies were used: anti-cleaved NOTCH1 (Val1744) (D3B8; 
Cell Signaling Technology) and f-actin (Sigma). Anti-rabbit and anti-mouse 
horseradish peroxidase-labelled IgG (Sigma) were used as secondary antibodies. 
Chemiluminescence was detected by using ECL substrate (Pierce) on a mini- 
LAS4000 Fujifilm device (GE Healthcare). 

Immunohistochemical analysis. NOTCH1 immunohistochemical staining was 
performed on a Leica Bond system using formalin-fixed paraffin-embedded tis- 
sue sections”. Samples were pre-treated using heat-mediated antigen retrieval 
with EDTA buffer (pH 9.0), epitope retrieval solution 2 (HIER2) for 30 min. 
Then, sections were incubated with anti-cleaved NOTCH1 rabbit monoclonal 
antibody (clone D3B8, catalogue number 4147, Cell Signaling Technology) at a 
final concentration of 8.5 1g ml — 1 for 60 min at room temperature and detected 
using a horseradish peroxidase (HRP)-conjugated compact polymer system. 
DAB was used as the chromogen. The section was then counterstained with 
haematoxylin and mounted with DPX. 

Sanger sequencing. PCR products were treated using ExoSap IT (USB 
Corporation) and sequenced with ABI Prism BigDye terminator v3.1 (Applied 
Biosystems) and 5 pmol of each primer. Sequencing reactions were run on an 
ABI-3730 automated sequencer (Applied Biosystems). All sequences were exam- 
ined with the Mutation Surveyor DNA Variant Analysis Software (Softgenetics). 
ChIP-seq and DNase-seq. ChIP-seq was performed in normal B-cell subpopula- 
tions and in cells (>90% tumour cell content) of a CLL patient with mutated 
IGHV, and DNase-seq only in the latter following standard protocols generated 
within the Blueprint Consortium. In brief, cells for ChIP-seq were fixed for 


8-16 min in 1% formaldehyde at 4 °C, and chromatin was sonicated for 15 min 
with a Biorruptor (Diagenode). Chromatin fragments ranging from 50 to 500 bp 
were selected and immunoprecipitation was carried out with antibodies from 
Diagenode against H3K4me3 (pAb-003-050 lot:A5051-001P), H3K4mel (pAb- 
194-050 lot:A1863-001P) and H3K27ac (pAb-196-050 lot: A1723-0041D) using 
approximately 500,000 cells per antibody. DNase I digestion was performed using 
60 units of the enzyme (Sigma) and 2.5 million cells. ChIP-seq and DNase-seq 
libraries were constructed using the Kapa Hyper Prep Kit (Kapa Biosystems). For 
each experiment, from 25 to 50 million reads were sequenced with an Illumina 
HiSeq2000 sequencer. Detailed protocols can be obtained from the Blueprint 
Consortium —_(http://www.blueprint-epigenome.eu/index.cfm?p = 7BF8A4B6- 
F4FE-861A-2AD57A08D63D0B58). 

4C-seq. 4C-seq template generation and amplification was performed as prev- 
iously described®***. In brief, 1 X 10” cells of two CLL patients were crosslinked 
with 2% formaldehyde (Merck), chromatin was digested with DpnII (New England 
Biolabs) followed by ligation with T4 ligase (Roche). Next, chromatin was decross- 
linked, DNA was digested with Csp6I (NEB) and re-ligated. PCR amplification of 
viewpoint regions and their ligated fragments was performed using primers 
5'-TGCCACACCTCCTTTTGATC-3’ and 5’-CCTTGTGGAAAGAGTCTC 
AC-3' (PAX5 putative enhancer, viewpoint fragment-end chr9:37,370,916- 
37,371,635) or 5'-CCGAGCTGGGGTAGCTGATC-3’ and 5’-TTGTGTCCA 
AAAGTTGTTTG-3’ (PAX5 promoter, viewpoint fragment-end chr9:37,033, 
553-37,034,192). Samples were sequenced using a MiSeq instrument (Illumina) 
using 50-bp single-end reads, and adding 5% PhiX control DNA. Data analysis 
was performed using 4Cseqpipe version 0.7 (May 2012) (downloaded from 
http://compgenomics.weizmann.ac.il/tanay/). Before mapping of the interacting 
regions to the genome, reads that are a consequence of undigested templates or 
self-ligation of the viewpoint fragment were removed. 

Deletion and mutation of human PAX5 enhancer in B-cell lines using 
CRISPR/Cas9. Human PAXS5 enhancer was deleted or mutated in RAMOS cells 
and in an Epstein-Barr virus (EBV)-transformed lymphoblastoid B-cell line 
using CRISPR/Cas9 genome editing. Guide RNAs (gRNAs) were designed using 
E-CRISP tool (http://www.e-crisp.org/E-CRISP/index.html)*. For the deletions, 
four gRNAs were designed flanking the PAX5 enhancer, two at each side (L1/L2 
and R1/R2) to be used in combinations (L1+R1, L1+R2, L2+R1, and L2+R2). 
In addition, two gRNAs were designed to target sites of mutations found in CLL 
(M1/M2). gRNAs sequences are: L1, 5’-GGGAACCAGGGCGTGGGAGC-3’; 
12, 5'-GTGAGGCAGAAACACCACAG-3’; R1, 5'’-GGCAGCATGCGGGCG 
TCATG-3’, R2, 5'-GCCAGGACCTGCTCTCCCAA-3’; M1, 5’-GTGAAAATT 
TACTCATGCTG-3'; and M2, 5’-GGTGGTACTCAGAGGCTGGG-3’. The 
gRNA oligonucleotides were cloned in pL-CRISPR.EFS.GFP vector (Addgene 
plasmid 57818)*, and lentiviral particles were produced on HEK293T cells by 
cotransfection with Gag-Pol and vesicular stomatitis virus G (VSV-G)-expressing 
vectors using the JetPEI transfection reagent (Polyplus). Viral supernatants were 
collected after 48 h and used for infection by spinoculation of Ramos and EBV- 
transformed lymphoblastoid B cells. After infection, green fluorescent protein 
(GFP)-positive cells were sorted (BD Influx, BD Bioscience) and grown for 
1 week. Total RNA was extracted with TRIzol (Invitrogen) and converted into 
cDNA with SuperScript First-Strand Synthesis System (Invitrogen). Then, 
human PAX5 expression was determined by quantitative real-time PCR 
(FastStart Universal SYBR Green Master Mix, Roche) using a 7500 Real-Time 
PCR system (Applied Biosystems). GAPDH was used as normalization control. 
The following primers were used: PAX5 forward, 5'-GAGCGGGTGTGT 
GACAATGA-3’; PAX5 reverse, 5’-GCACCGGAGACTCCTGAATAC-3’; 
GAPDH forward, 5'-GAAGGT GAAGGTCGGAGT-3’; and GAPDH reverse, 
5'-GAAGATGGTGATGGGATTTC-3’. 

To analyse the efficiency of the CRISPR/Cas9-induced deletions, DNA was 
extracted and PAX5 enhancer was PCR-amplified using HotStarTaq DNA 
Polymerase (Qiagen) and PAX5 enhancer-flanking oligonucleotides (forward) 
5'-GTTGTCTTGGAGGACTTTCAG-3’, and (reverse) 5'’- GIGTTATTGTGT 
ATGTGGCAG-3’. To determine the presence of CRISPR/Cas9-induced muta- 
tions we performed heteroduplex cleavage assays using the Guide-it Mutation 
Detection Kit (Clontech) with primers (forward) 5’-AGGATGAGAACG 
GGCAAAC-3’ and (reverse) 5'-GGAGCTTCCAGCTGAACTGA-3’. 
Statistical analysis. Fisher’s test or non-parametric tests were used to correlate 
clinical and biological variables according to MBL or CLL, and the presence or 
absence of the different drivers herein analysed. We evaluated the clinical effect 
(TTT and overall survival) of all driver mutated genes and chromosomal regions 
with recurrent CNAs in 5 (1%) or more patients. TTT was evaluated only in 
patients with Binet A and B. TTT and overall survival curves from the date of 
sampling were plotted by the Kaplan-Meier method and compared by the log- 
rank test”. We examined separately the prognostic impact of point mutations in 
driver genes (substitutions or small indels) and CNAs. The clinical impact (TTT) 
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of TP53, ATM and BIRC3 mutations was relatively similar to that of the loss of 
their respective chromosomal region, that is, del(17p) (T'P53) and del(11q) (ATM 
and BIRC3), respectively (Extended Data Fig. 8). Therefore, to evaluate the prog- 
nostic impact for each gene/region, both types of alterations were combined. 
Although the clinical effect of deletions and mutations was somehow different 
for del(6q15)/ZNF292 (Extended Data Fig. 8), owing to the fact that most point 
mutations in ZNF292 were truncating, we also combined these two alterations to 
investigate the clinical effect. Finally, the number of cases with mutations or 
CNAs in the respective chromosomal region of 6p21/NFKBIE, 10q24/NFKB2, 
and 15q15/MGA was too small to perform a separate analysis and therefore we 
also combined both types of alterations. Multivariate Cox regression analysis was 
used to assess the independent prognostic impact from Binet stage and IGHV 
mutational status of each driver in the outcome of the patients. Proportional 
hazards were checked using Schoenfeld’s test. We adjusted all the P values for 
multiple comparisons using the Benjamini-Hochberg correction. All statistical 
tests were two-sided and statistical significance was considered to be significant 
with an adjusted P = 0.05. All the analyses were performed using the SPSS 20 
software (http://www.ibm.com) or R software v3.1.3. 

Recurrently mutated genes in CLL were defined considering number and type 
of mutations, gene size and coverage, and local density of mutations derived from 
the 150 CLL/MBL WGS studies. To test whether a gene was mutated more 
frequently than expected by chance, we calculated the basal probability for each 

NysLO 
(Mns + ns)E 

In this equation, n,; is the total number of possible non-synonymous muta- 
tions for this gene, n, the total number of possible synonymous mutations, L is the 
effective length of the gene open reading frame (ORF), defined as the sum of the 
number of bases of the ORF for that gene which are callable at 10x coverage for 
all exomes or whole genomes analysed, and E is the effective length of all coding 
regions analysed, defined as the sum of the total lengths of the coding regions that 
are callable at 10X coverage for all exomes or whole genomes. Finally, 6 is the local 
density of mutations for this locus, which is determined by dividing the number of 
somatic mutations identified in the 150 WGS studies analysed in a 0.5-Mb region 
centred on the gene of interest. Thus, the probability P to find M or more non- 
synonymous mutations in a given gene from a set of N total number of somatic 
mutations in all patients is: 
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A score is computed by taking the base-10 logarithm of this probability (P). 
Genes for which more than 10% of somatic mutations caused a synonymous 
change were removed. Finally, 1,000 Monte-Carlo simulations were performed 
to estimate the FDR based on the total number of mutations observed (N), and the 
local mutational density for each gene. To identify genes that might be recurrently 
mutated in an IGHV subgroup, the same analysis was performed only with 
tumours belonging to the same group (IGHV-MUT or IGHV-UNMUT), and 
adjusting the local density of mutations for each subgroup according to the muta- 
tions obtained from WGS data. Genes were classified in three different tiers 
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(Extended Data Table 2). Tier 1 corresponds to those genes that were identified 
as statistically mutated in CLL as described above. Tier 2 includes those genes that 
are not statistically mutated when analysing CLL, but appeared significant when 
only one subclass (IGHV-MUT or IGHV-UNMUT) was considered. In addition, 
genes showing either recurrent mutations affecting the same residue, or resulting in 
mainly loss-of-function mutations, were included in tier 2. Finally, genes classified 
in tier 3 include those genes that were not in tiers 1 or 2, but containing somatic 
mutations previously described as driver mutations in the literature. 

A sample size of at least 500 tumours was selected during the ICGC study 
design, as this will give enough power to detect driver genes mutated in at least 3% 
of tumours”. 

In silico prescription. Drugs with potential therapeutic interactions with driver 
oncogenic protein products were retrieved as described”. 
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Extended Data Figure 1 | Molecular characterization of CLL and MBL the total number of somatic mutations identified in IGHV-MUT and IGHV- 
subtypes. a, CLL and MBL cases are divided according to the somatic UNMUT cases by WGS (*P <3 X 10 *). c, Main mutational signatures 
hypermutation mutational status of their clonotypic IGHV genes into IGHV- _ identified by WGS. d, Relationship between total number of mutations and 
MUT (black) and IGHV-UNMUT (grey) subgroups. Clinical and molecular contribution of signature 2 and the IGHV-status of tumours (red: IGHV-MUT; 
data from 506 cases profiled with four different platforms are shown. blue: IGHV-UNMUT; grey: undetermined). 

Chromosome 13 is shown in detail. Und, undetermined. b, Box plot showing 
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Extended Data Figure 2 | Distribution of CNAs and structural variants in 
506 cases of CLL and MBL. a, The total number of CNAs detected per 

case is indicated on top. Clinicobiological characteristics of patients (CLL/MBL 
and IGHV status) are shown on the middle row (MBL and IGHV-UNMUT 
depicted with green lines), together with the presence of chromothripsis. The 
main DNA copy number alterations identified are shown on the bottom. 
The presence of a deletion is indicated by a red line, homozygous deletion by a 
Bordeaux colour, blue lines indicate the presence of a gain, translocation 
t(14;18) is shown in green, grey lines represent the absence of alteration, and 
white lines indicate that no information is available for the t(14;18) for that 
particular case. b, Circular diagram representation of the distribution of 
structural variants detected in 148 WGS CLL samples. Displayed in the outer 
layer we show recurrence in CNAs, followed by all the breakpoints derived 
from large (> 100 bp) intra- and inter-chromosomal rearrangements (dark 
blue) in the inner layer. For clarity, we have set the scale of CNAs to 20%, as 
the maximum, showing sequence gains and losses, as positive (blue) and 


ARTICLE 


negative (red) values, respectively. Rearrangements are displayed in absolute 
counts, indicating that the values in each of the regions do not reflect the 
recurrence among samples, as some regions with high values derive from one or 
two cases, normally with complex karyotypes. We highlighted with dashed 
squares those regions (3p21, 11q23, 13q14, 14q32 and 18q21) with 
rearrangements observed in more than 5% of cases with WGS. As to 
rearrangement events, of a total of 358 breakpoints were detected across all 148 
samples, 41% of them correspond to interchromosomal translocations, while 
59% occurred within chromosomes. Chromosomes 11 and 13 appear as the 
most rearranged, entailing 25% of all the breaks, followed by chromosomes 3 
and 6 (with 8% each). Regarding interchromosomal rearrangements 
chromosomes 6, 8, 13 and 14 appear as the most translocated, being involved in 
32% of all translocations observed. Recurrent breakpoints are indicated by 
arrows: black arrows for rearrangements affecting 18q21 and BCL2 (four cases 
with 14q32 and one case with 2p11) and blue arrows for rearrangements 
affecting 13q14 (nine cases with different chromosomes). 
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Extended Data Figure 3 | Schematic view of the translocations involving 
BCI2 and patterns of complex structural variants in the WGS of a CLL case. 
a, A total of nine translocations t(14;18)(q32;q21) were identified, resulting 
in the fusion of the IGH enhancer on the 3’ UTR of BCL2, as well as one 
translocation between the IGK locus on chromosome 2, t(2;18)(p11;q21), 
which affected the promoter region of BCL2. Cases with these translocations 
had multiple somatic mutations in the 5'-region of BCL2 (arrowheads and 
lollipops). RNA-seq data analysis revealed an allelic imbalance, with the 
rearranged allele usually much more expressed that the germline allele (pie- 
charts within lollipops showing in red the mutant allele fraction detected by 
RNA-seq for each somatic mutation), probably reflecting the effect of the 
translocation on the expression of BCL2 and recruitment of the SHM 
machinery to this locus. b, Gene expression analysis revealed that the presence 
of the t(14;18)(q32;q21) resulted in overexpression of BCL2 in these cases 
when compared with other CLL or MBL cases. c, FISH analysis of CLL cells 
from case 151 using a dual colour fusion probe for IGH (green) and BCL2 (red). 
Fusion signals are indicated with arrows. d, Case numbers and genomic 
coordinates for the detected translocations between immunoglobulin genes and 
BCL2. e, Circular representation of structural variants detected in six CLL 
tumours with complex rearrangements including four cases with chromoplexia 
(samples 16, 141, 294 and 753), chromothripsis (sample 880) and combined 
(sample 853). Chromosomes are represented in the outer layer, regions 
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lost (red) and gained (blue) detected by SNP arrays are shown in the inner layer. 
Inter and intrachromosomal rearrangements are represented as black and blue 
lines, respectively. f, Reconstruction at base pair resolution of the resulting 
reorganized chromosomes in case 853 including der(X) in yellow, der(2) in 
dark blue, der(8) in green, and der(11) in red. In these reconstructions, only 
reorganized fragments larger than 100 bp are represented unless they involve 
interchromosomal translocations. Rearranged regions are not drawn to scale. 
Arrows denote inverted fragments relative to their normal and original 
orientation. Flanking portions of the derivative chromosomes without detected 
rearrangements are collapsed and shown as broken boxes. Estimated sizes 

(in Mb) for the resulting derivative chromosomes are shown on the left side, 
including the fraction (percentage) relative to the corresponding normal 
chromosome size. Asterisks indicate breakpoints that have been experimentally 
studied and verified. Genes disrupted by breakpoints are displayed on the left 
side of each of the proposed derivative chromosomes in purple. g, Whole- 
chromosome painting confirmed the sequencing reconstruction proposed in b. 
Simultaneous painting of chromosome 8 (green) and 11 (red) shows a normal 
chromosome 11 and a shorter chromosome der(11) as well as a normal 
chromosome 8 and der(8) that contains a fragment of chromosome 11 
inserted below the centromeric region. In addition, a small fragment of 
chromosome 8 is detected in the telomeric region of derivative chromosome 2. 
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or non-coding mutations in NOTCH1 (case numbers are indicated inside). 
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Extended Data Figure 5 | Effect of mutations in the PAX5 enhancer on gene expression. Comparative analysis of gene expression between IGHV-MUT CLL 
tumours with or without (WT) mutations in the PAX5 enhancer for 15 genes located around the recurrently mutated enhancer in CLL and MBL samples. 
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Extended Data Figure 6 | PAX5 enhancer deletion downregulates PAX5 
expression in human B cell lines. a, PCR analysis of CRISPR/Cas9 deletion of 
PAX5 enhancer in lymphoblastoid B cells (left) and RAMOS cells (right). 

b, Quantitative RT-PCR (RT-qPCR) analysis of PAX5 expression in PAX5 
enhancer deleted lymphoblastoid B cells (left) and RAMOS cells (right). Bars 
represent mean relative PAX5 mRNA levels after normalization to GAPDH 
expression and relative to wild-type cells. Errors bars represent the s.d. between 
technical triplicates of CRISPR/Cas9-induced mutations in PAX5 enhancer in 
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lymphoblastoid B cells (left) and RAMOS cells (right). c, PCR analysis of 
CRISPR/Cas9-introduced mutations in the PAX5 enhancer in lymphoblastoid 
B cells (left) and RAMOS cells (right). d, RT-qPCR analysis of PAX5 expression 
in PAX5-enhancer-mutated lymphoblastoid B cells (left) and RAMOS cells 
(right). Bars represent mean relative PAX5 mRNA levels after normalization to 
GAPDH expression and relative to wild-type cells. Error bars represent the s.d. 
between technical triplicates (*P < 0.05; **P < 0.01). 
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Extended Data Figure 7 | Distribution of genetic, epigenetic and expression 
features in CLL. Distribution of genetic features, family of IGHV 
rearrangements and BCR stereotypes in naive cell-like CLL cases, intermediate 
CLL and memory-cell-like CLL cases. a, Frequency of driver mutations. 

b, c, Copy number alterations (b) and contribution of signature 2 (c) according 
to the epigenetic classification (green: naive-like; red: memory-like; yellow: 
intermediate). MBL patients were excluded from this analysis. d, Usage of 
IGHV families. e, Proportion of cases with stereotyped IGHV sequences. 

f, Number of cases of each of the stereotyped subsets identified in our series. For 
the analysis shown in e and f, both CLL and MBL patients were merged. The 
asterisk on the top of the bars in a and b indicates that the frequency of the 
genetic feature is higher than expected by chance in one particular epigenetic 


subgroup (P < 0.05). CP, chromoplexy; CT, chromothripsis. g, Relationship 
between genetic and epigenetic alterations in CLL. Correlation between the 
total number of somatic mutations detected by WGS per case and the number 
of CpGs showing differential methylation per case as compared to naive B cells 
(r = 0.64, P< 0.001). h, Correlation between the contribution of signature 2 
mutations and the number of differential CpGs as in a. Tumours are coloured 
according to their IGHV status. i, Comparative analysis of CLL and MBL. 
Principal component analysis of differential methylation (up) and gene 
expression (bottom) data derived from either CLL tumours or MBL samples, 
reveals that MBL samples usually clustered with their corresponding IGHV- 
status CLL samples. 
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Extended Data Figure 8 | Kaplan-Meier plot of time to first treatment 
stratified by the type of aberration in ATM, BIRC3, TP53 and ZNF292 


genes. TTT curves of the 386 untreated patients with Binet stage A or B. Cases 


are stratified according to the gene mutation status: wild type (green line), 


ARTICLE 


TTT 
BIRC3 
1 — wT (n=352) 
—— Mut (n=7) 

0.8 — Del w/o Mut (n=27) 
= 
= 0.6 
we} 
[3] 
we} 
20.4 
oa 

0.2 

P-Value (Mut vs Del w/o Mut) = 0.671 
0.0 
I T T T T T T 1 
0 2 4 6 8 10 12 14 
Years 
ZNF292 
1 —— WT (n=367) 
—— Mut (n=11) 


— Del w/o Mut (n=8) 


© 
ron) 


Probability 
S 
_ 


P-Value (Mut vs Del w/o Mut) = 0.003 


T T T 
0 2 4 6 8 


Years 


mutated and mutated+ deleted (Mut, blue line) or deleted (Del, red line). The 
log-rank P-values comparing the mutated (blue line) and the deleted (red line) 


cases are shown. 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Extended Data Table 1 | Clinical information at the time of sampling of 452 patients with CLL and 54 with MBL 


Parameter Category ae ae 
Gender Male / Female 275/177 25/29 
Age (years) 66 (19-93) 71 (39-89) 
Diagnosis 92 13 
Cinta states Stable disease 242 41 
Progression 118 _ 
Hastie 3.12 (0-25.9) 3.68 (0-21.9) 
A 297 (68%) 
Binet stage B 96 (22%) 
c 44 (10%) ? 
Unknown 15 
0 206 (51%) 
Rai stage I-Il 140 (35%) 
I-IV 58 (14%) ~ 
Unknown 48 
Lymphocytes (x10°/L) 21.8 (1.69-300) 5.9 (3-11.2) 
Absolute clonal B cells (x10°/L) 14.3 (0.3t-209.7) 3.4 (1.1-4.99) 
Hemoglobin (g/L) 135 (45-176) 142 (117-181) 
Platelets (x10°/L) 181 (7-791) 206 (57+-394) 
LDH >UNL 48/376 (13%) 1/53 (2%) 
CD38 High 114/430 (26%) 5/54 (9%) 
ZAP-70 High 98/430 (23%) 14/54 (26%) 
CD49d High 100/423 (24%) 12/54 (22%) 
Del13q 222 (49%) 31 (57%) 
Cytogenetics Trisomy 12 68 (15%) 5 (9%) 
Del11q22 44 (10%) 2 (4%) 
Del17p 17 (4%) 1 (4%) 
IGHV Unmutated 166/445 (37%) 13/51 (25%) 
Naive-like 151/446 (34%) 11/54 (20%) 
Epigenetic subtype Intermediate 64/446 (14%) 5/54 (9%) 
Memory-like 231/446 (52%) 38/54 (70%) 
Follow-up, (years) All surviving 2.9 (0.1-14.1) 2.1 (0.8-9.7) 
5-year TTT (95% Cl) All 55% (46-64%) 2% (0-6%) 
5-year OS (95% Cl) All 78% (71-85%) 96% (91-100%) 


Sampling always before therapy; quantitative parameters are expressed as median (range); CD38 high: > 30% positive CLL cells; ZAP70 high: > 20% positive cells; |GHV-unmutated: > 98% identity with germ 
line. OS, overall survival. 

*MBL only with CLL-like immunophenotype. 

+Lower values corresponded to small lymphocytic lymphoma. 

{MBL with low platelet count due to chronic liver disease. 
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Extended Data Table 2 | Recurrently mutated genes in CLL and MBL by WGS, WES or CNAs 


Total CLL Frequency Frequency Frequency Frequency 
cases Frequency CLLIGHV- CLLIGHV- Frequency MBLIGHV- MBL IGHV- 

Symbol mutated Score CLL (n=452) UNMUT MUT MBL (n=54) UNMUT MUT Tier Effect 
NOTCH1 57 92.23 12.61 28.92 2.88 5.56 23.08 0.00 1 Activating 
ATM 50 56.50 11.06 24.70 2.88 5.56 15.38 0.00 1 Truncating 
SF3B1 39 56.47 8.63 13.25 6.12 3.70 7.69 2.63 1 Activating 
BIRC3 40 56.41 8.85 19.28 2.52 3.70 7.69 2.63 1 Truncating 
CHD2 27 81.96 5.97 3.61 7.55 7.41 7.69 7.89 1 Truncating 
TP53 24 56.38 5.31 7.83 3.96 1.85 7.69 0.00 1 Truncating 
ZNF292 21 56.95 4.65 10.24 1.08 7.41 30.77 0.00 1 Truncating 
MYD88& 18 66.46 3.98 0.00 6.47 0.00 0.00 0.00 1 Activating 
KLHL6 14 63.49 3.10 0.00 5.04 0.00 0.00 0.00 1 - 
POT1 14 44.11 3.10 8.43 0.00 7.41 23.07 2.63 1 Truncating 
MGA 13 38.50 2.88 6.63 0.36 3.70 15.38 0.00 1 Truncating 
DDX3X 12 38.99 2.65 5.42 0.72 0.00 0.00 0.00 1 Truncating 
TRAF3 14 38.38 3.10 5.42 1.44 0.00 0.00 0.00 1 Truncating 
SETD2 10 33.49 2.21 4.82 0.72 0.00 0.00 0.00 1 Truncating 
BRAF 9 26.75 1.99 4.82 0.00 1.85 0.00 0.00 1 Activating 
SYNE1 8 16.17 1.77 1.20 1.44 0.00 0.00 0.00 1 - 
XPO1 8 24.49 1.77 4.82 0.00 1.85 7.69 0.00 1 Activating 
IRF4 6 25.19 1.33 1.81 0.72 0.00 0.00 0.00 1 Activating 
EGR2 9 27.83 1.99 4.22 0.72 0.00 0.00 0.00 1 Activating 
CCND2 6 19.28 1.33 1.20 1.44 1.85 0.00 2.63 1 Activating 
ZMYM3 8 19.43 VAT 3.61 0.72 0.00 0.00 0.00 1 Truncating 
ARID1A 7 17.98 1.55 1.81 1.44 0.00 0.00 0.00 1 Truncating 
ATRX 7 14.79 1.55 2.41 1.08 0.00 0.00 0.00 1 - 
NFKBIE 5 18.95 1.11 2.41 0.36 0.00 7.69 0.00 1 Truncating 
CNOT3 6 16.41 1.33 1.81 1.08 1.85 7.69 0.00 1 Activating 
BCOR 6 14.99 1.33 3.61 0.00 0.00 0.00 0.00 1 Truncating 
MED12 6 14.11 11:33 3.61 0.00 0.00 0.00 0.00 1 Activating 
PTPN11 5 14.79 1.11 2.41 0.36 0.00 0.00 0.00 1 Activating 
FBXW7 5 14.21 1.11 2.41 0.36 0.00 0.00 0.00 1 Activating 
NXF1 6 13.99 1.33 2.41 0.72 0.00 0.00 0.00 1 Truncating 
SETD1A 5 12.62 1.11 3.01 0.00 0.00 0.00 0.00 1 - 
ASXL1 5 12.24 1.11 2.41 0.36 0.00 0.00 0.00 1 Truncating 
FSIP2 iA 13.86 1.55 1.20 1.44 0.00 0.00 0.00 1 Truncating 
RPS15 4 14.62 0.88 2.41 0.00 0.00 0.00 0.00 1 Activating 
FUBP1 4 14.33 0.88 2.41 0.00 0.00 0.00 0.00 1 Truncating 
HIST1H1B 4 12.75 0.88 0.00 1.44 1.85 7.69 0.00 1 - 
SPEN 6 11.48 1.33 1.81 1.08 0.00 0.00 0.00 2 Truncating 
KIAA0947 5 10.97 1.11 0.60 1.44 0.00 0.00 0.00 2 Truncating 
MLL2 5 10.77 4:41 0.00 1.80 0.00 0.00 0.00 2 Truncating 
POLR3B 4 9.27 0.88 1.20 0.72 0.00 0.00 0.00 2 Activating 
BAZ2A 3 6.57 0.66 1.20 0.36 0.00 0.00 0.00 2 Truncating 
BAX 3 9.21 0.66 0.00 1.08 0.00 0.00 0.00 2 Truncating 
KRAS 3 9.02 0.66 1.81 0.00 0.00 0.00 0.00 2 Activating 
LUC7L2 3 8.71 0.66 0.60 0.72 0.00 0.00 0.00 2 Truncating 
IKZF3 3 8.53 0.66 1.81 0.00 1.85 0.00 0.00 2 Activating 
DNAJC11 3 8.22 0.66 1.20 0.36 0.00 0.00 0.00 2 Truncating 
ZC3H18 3 7.38 0.66 0.60 0.72 0.00 0.00 0.00 2 Truncating 
SKIV2L2 3 7.22 0.66 1.20 0.36 0.00 0.00 0.00 2 Activating 
CREBBP 3 6.43 0.66 0.60 0.72 0.00 0.00 0.00 2 Truncating 
ANKHD1 3 6.26 0.66 1.20 0.36 1.85 0.00 2.63 2 Truncating 
NKAP 2 5.60 0.44 0.00 0.72 0.00 0.00 0.00 2 Truncating 
TLR2 2 4.94 0.44 0.00 0.72 0.00 0.00 0.00 2 Activating 
MED1 2 4.53 0.44 1.20 0.00 0.00 7.69 0.00 2 Activating 
CDKN1B 1 - 0.22 0.00 0.36 0.00 0.00 0.00 3 Truncating 
CDKN2A 1 - 0.22 0.00 0.36 0.00 0.00 0.00 3 Truncating 
CD79A 1 - 0.22 0.00 0.36 0.00 0.00 0.00 3 Truncating 
CD79B 1 - 0.22 0.00 0.36 0.00 0.00 0.00 3 Activating 
IRAK1 1 - 0.22 0.00 0.36 0.00 0.00 0.00 3 Truncating 
NRAS 1 - 0.22 0.60 0.00 0.00 0.00 0.00 3 Activating 
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Mutations driving CLL and their 
evolution in progression and relapse 
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Which genetic alterations drive tumorigenesis and how they evolve over the course of disease and therapy are central 
questions in cancer biology. Here we identify 44 recurrently mutated genes and 11 recurrent somatic copy number 
variations through whole-exome sequencing of 538 chronic lymphocytic leukaemia (CLL) and matched germline DNA 
samples, 278 of which were collected in a prospective clinical trial. These include previously unrecognized putative 
cancer drivers (RPS15, IKZF3), and collectively identify RNA processing and export, MYC activity, and MAPK signalling 
as central pathways involved in CLL. Clonality analysis of this large data set further enabled reconstruction of temporal 
relationships between driver events. Direct comparison between matched pre-treatment and relapse samples from 59 
patients demonstrated highly frequent clonal evolution. Thus, large sequencing data sets of clinically informative 
samples enable the discovery of novel genes associated with cancer, the network of relationships between the driver 


events, and their impact on disease relapse and clinical outcome. 


In recent years, unbiased massively parallel sequencing of whole 
exomes (WES) in chronic lymphocytic leukaemia (CLL) has yielded 
fresh insights into the genetic basis of this disease’ *. Two important 
constraints have limited previous WES analyses. First, cohort size is 
critical for statistical inference of cancer drivers’, and previous CLL 
WES series* had a power of only 68%, 23% and 7% to detect putative 
CLL genes mutated in 5%, 3% and 2% of patients, respectively (http:// 
www.tumorportal.org/power)°. Limited cohort size has also curtailed 
the ability to effectively learn the relationships between CLL driver 
events, such as their co-occurrence and the temporal order of their 
acquisition. Second, the composition of the cohort of previous WES 
studies has limited the ability to accurately determine the impact of 
drivers and clonal heterogeneity on clinical outcome, since they 
included samples collected at variable times from subjects exposed 
to a variety of therapies. 

To overcome these challenges, we analysed WES data from 538 
CLLs, including 278 pre-treatment samples collected from subjects 
enrolled on the phase III CLL8 study*. This trial established the com- 
bination of fludarabine (F), cyclophosphamide (C) and rituximab (R) 
as the current standard-of-care first-line treatment for patients of 
good physical fitness, with a median of >6 years of follow-up. Here 
we report the discovery of novel genes associated with CLL, the com- 
prehensive genetic characterization of samples from patients before 
exposure to a uniform and contemporary treatment, and the unco- 
vering of features contributing to relapse from this therapy. 


Unbiased candidate CLL gene discovery 

We performed WES of CLL and matched germline samples, collected 
from 278 subjects enrolled on the CLLS8 trial, with mean read depth 
of 95.0 and 95.7, respectively (Supplementary Tables 1 and 2). 
Consistent with previous CLL WES studies, we detected a mean = s.d. 
rate of 21.5+7.9 silent and non-silent single nucleotide variants 
(sSNVs) and somatic insertions and deletions (sIndels) per exome 
(Supplementary Tables 2 and 3)'”. 

We inferred candidate cancer-associated genes in CLL through 
implementation of MutSig2CV*’. To maximize statistical sensitivity 
for driver detection®, we combined the CLL8 cohort with two prev- 
iously reported and non-overlapping WES cohorts’”, thereby increas- 
ing the size of the cohort to 538 CLLs. This cohort size is expected to 
saturate candidate CLL gene discovery for genes mutated in 5% of 
patients, and provides 94% and 61% power to detect genes mutated 
in 3% and 2% of patients, respectively”. 

We detected 44 putative CLL driver genes, including 18 CLL 
mutated drivers that we previously identified’, as well as 26 additional 
putative CLL genes (Figs 1 and 2 and Extended Data Figs 1 and 2). In 
total, 33.5% of CLLs harboured a mutation in at least one of these 26 
additional genes. Targeted DNA sequencing as well as variant allele 
expression by RNA-seq demonstrated high rates of orthogonal valid- 
ation (Extended Data Fig. 3). 

Of the newly identified putative cancer-associated genes, some 
were previously suggested as CLL drivers in studies using other 
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Figure 1 | The landscape of putative driver gene 
mutations and recurrent somatic copy number 
variations in CLL. Somatic mutation information 
is shown across the 55 putative driver genes and 
recurrent somatic copy number alterations (rows) 
for 538 primary patient samples (from CLL8 
(green), Spanish ICGC (red) and DFCI/Broad 
(blue)) that underwent WES (columns). Blue labels, 
recurrent somatic CNAs; bold labels, putative CLL 
cancer genes previously identified in ref. 3; 
asterisked labels, additional cancer-associated genes 
identified in this study. Samples were annotated for 
IGHV status (black, mutated; white unmutated; red, 
unknown), and for exposure to therapy before 
sampling (black, previous therapy; white, no 
previous therapy; red, unknown previous treatment 
status). 


(Extended Data Fig. 4). A gene set enrichment analysis of matched 


(n = 17, 3.2%), which we detected as recurrently inactivated by inser-  RNA-seq data revealed downregulation of genes that are suppressed 


tions and nonsense mutations, was previously found to be inactivated upon MYC activation 


in B cells’® (Supplementary Table 4). In addi- 


through deletions® and truncating mutations*’ in high-risk CLL tion to MGA, we report two additional candidate driver genes that 
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Figure 2 | Selected novel, putative driver gene maps. Individual gene 
mutation maps for select putative drivers, showing mutation subtype (for 
example, missense), position and evidence of mutational hotspots, based on 
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COSMIC database information (remaining gene maps shown in Extended 
Data Fig. 4). y axis counts at the bottom of the maps reflect the number 
of identified mutations in the COSMIC database. 
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probably modulate MYC activity (PTPN11 (ref. 11) (n = 7, 1.3%) and 
FUBP1 (ref. 12) (n = 9, 1.7%)), highlighting MYC-related proteins as 
drivers of CLL. 

Another cellular process affected by novel CLL drivers is the 
MAPK-ERK pathway, with 8.7% of patients harbouring at least one 
mutation in CLL genes in this pathway. These included mutations in 
RAS genes (NRAS, n = 9 and KRAS, n = 14, totalling 4.1%); BRAF 
(n = 21, 3.7%); or the novel putative driver MAP2K1 (n = 12, 2%). 
This finding suggests that further therapeutic exploration of MAPK- 
ERK pathway inhibitors in CLL would be beneficial. Notably, BRAF 
mutations in CLL did not involve the canonical hotspot (V600E) seen 
in other malignancies*’*"’, but rather clustered heavily around the 
activation segment of the kinase domain (Fig. 2). This may be indi- 
cative of a different mechanism of activity’>'®, and has clinical 
implications, as BRAF inhibitors are thought to be less effective for 
non-canonical BRAF mutations’”’’. 

In addition to highlighting novel cellular processes and pathways 
affected in CLL, many of the 26 additional CLL genes more densely 
annotated pathways or functional categories previously identified in 
CLL”, including RNA processing and export (FUBP1, XPO4, EWSR1 
and NXFI1), DNA damage (CHEK2, BRCC3, ELF4 (ref. 20) and 
DYRKIA (ref. 21)), chromatin modification (ASXL1, HIST1H1B, 
BAZ2B and IKZF3) and B-cell-activity-related pathways (TRAF2, 
TRAF3 and CARD11). 

We discovered a number of putative CLL drivers previously unre- 
cognized in human cancer. In a first example, we found that RPS15 
was recurrently mutated (n = 23, 4.3%), with mutations localized to 
the carboxy-terminal region (Fig. 2) at highly conserved sites (median 
conservation score of 94 out of 100). This component of the S40 
ribosomal subunit has not been extensively studied in cancer, 
although rare mutations have been identified in Diamond-Blackfan 
anaemia”. A gene set enrichment analysis revealed upregulation of 
gene sets related to adverse outcome in CLL as well as immune res- 
ponse gene sets (Supplementary Table 4). In another example of a 
previously unrecognized cancer gene, we identified recurrent L162R 
substitutions (n = 11, 2.0%) in IKZF3, targeting a highly conserved 
amino acid (93 out of 100 conservation score). This gene is a key 
transcription factor in B-cell development”, and its upregulation 
has been associated with adverse outcome**”’. 

In addition to sSNVs and sIndels, we characterized somatic copy 
number alterations (CNAs) directly from the WES data (Extended 
Data Fig. 5 and Supplementary Tables 5 and 6). When we accounted 
for all 55 identified driver events—including non-silent ssNVs and 
sIndels in putative CLL genes (n = 44), and recurrent somatic CNAs 
(n = 11)—91.1% of CLLs contained at least one driver. Moreover, 
65.4% of CLLs now harboured at least 2 drivers, and 44.4% at least 
3 drivers, compared with 55.9% and 31.8% were we to exclude the 26 
additional CLL genes. 


Drivers and CLL characteristics 


The larger cohort size also provided statistical power to examine 
associations between genetic alterations and key CLL features. First, 
we examined whether mutations differed between IGHV mutated and 
unmutated subtypes, the two main subtypes of CLL. In agreement with 
the relative clinical aggressiveness of IGHV unmutated CLL, most 
drivers were found in a higher proportion in this subtype (Extended 
Data Fig. 6a). Only three driver genes were enriched in the IGHV 
mutated CLL (del(13q), MYD88 and CHD2), suggesting a role for 
these specific alterations within the oncogenic process of this subtype. 

Second, since therapy could lead to selection of particular driver 
events, we examined the 33 samples (6.2%, none enrolled on CLL8) 
that had received therapy before sampling. Previous treatment was 
associated with enrichment in TP53 and BIRC3 mutations del(17p) 
and del(11q), as previously indicated”, as well as in mutated DDX3X 
and MAP2K1, suggesting their selection by therapeutic interventions 
(Extended Data Fig. 6b). 
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Third, we examined whether coherent patterns of co-occurrence of 
driver events were evident, limiting our analysis to the 31 drivers with 
>10 affected patients. Of 465 possible pairs, 11 combinations had 
statistically significant high or low co-occurrence (Extended Data 
Fig. 6c, d). As expected, a high degree of co-occurrence was found 
between mutated TP53 and del(17p), and between mutated ATM and 
del(11q). Both mutated ATM and del(11q) significantly co-occurred 
with amp(2p), and associations between the presence of tri(12) 
with mutated BIRC3 and with mutated BCOR were also found. A 
significantly low rate of co-occurrence was seen between del(13q) 
and tri(12). 

Fourth, we examined the temporal sequence of driver acquisition in 
the evolutionary history of CLL. To do this, we computed the cancer- 
cell fraction (CCF) of each mutation across the 538 samples, and 
identified mutations as either clonal or subclonal’’ (58.1% of muta- 
tions classified as subclonal). Both clonal and subclonal sSNVs were 
similarly dominated by C> T transitions at CpG sites (Extended 
Data Fig. 7). 

We first classified driver events probably acquired earlier or later 
in the disease course based on the proportion of cases in which the 
driver was found as clonal (Fig. 3a). This large data set further 
enabled the inference of temporal relationships between pairs of 
drivers. We systematically identified instances in which a clonal 
driver was found together with a subclonal driver within the same 
sample, as these pairs reflect the acquisition of one lesion (clonal) 
followed by another (subclonal), providing a temporal ‘edge’ leading 
from the former to the latter”*”°. For each driver, we calculated the 
relative enrichment of out-going edges compared to in-going edges 
to define early, late and intermediary drivers (Supplementary Table 
7). For 23 pairs connected by at least 5 edges, we further established 
the temporal relationship between the two drivers in each pair, and 
thereby constructed a temporal map of the evolutionary trajectories 
of CLL (Supplementary Table 8 and Fig. 3b). This network high- 
lights somatic CNAs as the earliest events with two distinct points of 
departure involving del(13q) and tri(12). It further demonstrates an 
early convergence towards del(11q) and substantial diversity in late 
drivers. Finally, this analysis suggests that in the case of the tumour 
suppressor genes ATM and BIRC3, copy loss precedes sSNVs and 
sIndels in biallelic inactivation. 


Impact on clinical outcome 


We examined whether the presence of any of the drivers detected in at 
least 10 of the 278 pre-treatment CLL8 samples was associated with 
impact on clinical outcome (Fig. 4a and Extended Data Figs 8 and 9; 
the genomics analysis team was blinded to the clinical outcome data). 
Previous investigations suggested an impact for 7 CLL genes (SF3B1, 
ATM, TP53, XPO1, EGR2, POT1 and BIRC3)**-*°. We found shorter 
progression-free survival (PFS) associated only with TP53 and SF3B1 
mutations. Of the newly identified recurrent lesions evaluated (MGA, 
BRAF and RPS15), we observed a shorter PFS with mutated RPS15 
(Bonferroni P = 0.024). 

The presence of a detectable pre-treatment subclonal driver has 
been previously associated with shorter remissions in patients treated 
with heterogeneous therapies’. In the CLL8 cohort, we again found 
that the presence of a pre-treatment subclonal driver was associated 
with a significantly shorter PFS (hazard ratio (HR) 1.6 (95% confid- 
ence interval (CI) 1.2-2.2), P= 0.004). This association remained 
significant in both the FC (fludarabine and cyclophosphamide) and 
FCR (fludarabine, cyclophosphamide and rituximab) treatment arms 
(Fig. 4b), with a non-significant trend when IGHV mutation status 
was added to a multivariable model in addition to the treatment arm 
(1.3 (0.9-1.9), P = 0.102). 


Clonal evolution at disease relapse 


To define clonal evolution in disease relapse, we performed WES on 
matched samples collected at the time of relapse from 59 of 278 CLL8 
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Figure 3 | Inferred evolutionary history of CLL. a, The proportion in whicha 
recurrent driver is found as clonal or subclonal across the 538 samples is 
provided (top), along with the individual cancer cell fraction (CCF) values for 
each sample affected by a driver (tested for each driver with a Fisher’s exact test, 
comparing to the cumulative proportions of clonal and subclonal drivers 
excluding the driver evaluated). Median CCF values are shown (bottom, bars 
represent the median and interquartile range for each driver). b, Temporally 
direct edges are drawn when two drivers are found in the same sample, one in 
clonal and the other in subclonal frequency. These edges are used to infer 

the temporal sequences in CLL evolution, leading from early, through inter- 
mediate to late drivers. Note that only driver pairs with at least five connecting 
edges were tested for statistical significance and only drivers connected by at 
least one statistically significant edge are displayed (see Supplementary 
Methods and Supplementary Tables 6 and 7). 


subjects (Supplementary Tables 9 and 10). We observed large clonal 
shifts between pre-treatment and relapse samples in the majority of 
cases (57 of 59), thus demonstrating that CLL evolution after therapy 
is the rule rather than the exception (Fig. 5a). The relapse clone was 
already detectable in pre-treatment WES in 18 of 59 (30%) cases, 
demonstrating that the study of pre-treatment diversity anticipates 
the future evolutionary trajectories of the relapsed disease™*. By tar- 
geted deep sequencing, we screened for relapse drivers in 11 of the 41 
of pre-treatment samples in which WES did not detect the relapse 
driver. In 7 of these 11 CLLs, at least one relapse driver was detected in 
the pre-treatment sample (Supplementary Table 10). 

We further compared the pre-treatment and relapse CCF for each 
driver, and observed three general patterns. First, tri(12), del(13q) and 
del(11q), suggested as early drivers (Fig. 3b), tended to remain stably 
clonal despite marked, often branched, evolution (Fig. 5b (CLL cases 
GCLL-115 and GCLL-307), Fig. 5c, top row, and Extended Data 
Fig. 10). This confirms that these are indeed early events probably 
shared by the entire malignant population. Second, TP53 mutations 
and del(17p) demonstrated increases in CCF upon relapse, suggesting 
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Figure 4 | Associations of CLL drivers with clinical outcome. a, Kaplan- 
Meier analysis (with logrank P values) for putative drivers with associated 
impact on progression-free survival (PFS) or overall survival (OS) probabilities 
in the cohort of 278 patients that were treated as part of the CLL8 trial. For 
candidate CLL genes tested here for the first time regarding impact on outcome, 
a Bonferroni P value is also shown. b, Presence of a subclonal driver is 
associated with a lower probability of PFS, in both the FC and FCR arms, 
and a trend towards shorter OS. 


a fitness advantage under therapeutic selection (Fig. 5b (GCLL-27) 
and Fig. 5c, middle row). The novel driver IKZF3 increased in CCF 
in 3 of 4 relapse cases (and remained clonal in the fourth), supporting 
the suggestion that these mutations probably enhance fitness. 
Third, mutations in SF3B1 and ATM, identified as temporally inter- 
mediate or late drivers, seemed just as likely to decline in CCF as they 
were to increase (Fig. 5c, bottom row). These results suggest that 
within this therapeutic context such mutations do not provide the 
same strength of fitness advantage compared to TP53 disruption. In 
addition, we observed nine instances each of multiple distinct alleles 
of ATM and SF3B1 mutations within the same CLL (for example, 
GCLL-307 in Fig. 5b), indicating convergent evolution of these late- 
occurring CLL drivers. 

This series also informs us regarding the mutagenesis of the tumour 
suppressor genes TP53 and ATM, where biallelic inactivation is com- 
mon. In the case of ATM, we typically find a fixed clonal del(11q22.3) 
and subclones harbouring sSNVs affecting the other allele that shift in 
CCF over time (for example, GCLL-307). We confirmed that the 
breakpoints of somatic CNAs in matched relapse and pre-treatment 
samples were highly consistent, probably representing the same dele- 
tion event. These data suggest that mono-allelic ATM deletion pro- 
vides a fitness advantage that enables the expansion of the malignant 
population with subsequent growth of multiple co-existing clones 
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that harbour a ‘second hit’ (genetic disruption of the remaining allele). 
Thus, while a biallelic lesion is clearly selected for (Extended Data 
Fig. 6c), the longitudinal data support the temporal analysis (Fig. 3b) 
in which del(11lq) precedes ATM mutations, reflecting the higher 
likelihood of a focal copy number loss compared with a deleterious 
point mutation****. In contrast, we consistently observed a concord- 
ant rise of del(17p) and TP53 mutations in all 12 CLLs harbouring 
both of these events, and none of these cases exhibited multiple dis- 
tinctly evolving TP53 mutated clones. These observations suggest that 
a true biallelic inactivation of TP53 is required, and indeed, across the 
538 CLL samples, the odds ratio for co-occurrence of del(17p) and 
TP53 mutation was far greater than the odds ratio for co-occurrence 
of del(11q) and ATM mutation (97.22 versus 10.99, respectively). 
These observations are in agreement with a recent analysis that 
suggested that with the exception of a few genes such as TP53, 
tumour suppressor genes in sporadic cancers are haploinsufficient 
to begin with, and that the second hit only further builds on this 
fitness advantage”. 


Conclusions 


This study of WES in CLL enabled a comprehensive identification of 
putative cancer-associated genes in CLL, generating novel hypotheses 
regarding the biology of this disease, and identifying previously unre- 
cognized putative CLL drivers such as RPS15 and IKZF3. The detailed 
characterization of the compendium of driver lesions in cancer is of 


particular importance as we strive to develop personalized medicine, 
because driver genes may inform prognosis (for example, RPS15 
mutations) and identify lesions that may be targeted by therapeutic 
intervention (for example, MAPK pathway mutations and specifically 
the unexpected enrichment for non-canonical BRAF mutations). 
Through the inclusion of samples collected within a landmark clinical 
trial with mature outcome data, we could further study the impact of 
genetic alterations in the context of the current standard-of-care 
front-line therapy. As targeted therapy is rapidly transforming the 
treatment algorithms for CLL, future studies will be required to re- 
examine these associations in this context*®. 

An important benefit of the larger cohort size is the enhanced ability 
to explore relationships between driver lesions based on patterns of 
their co-occurrence. Focusing on temporal patterns of driver acquisi- 
tion—based on the distinction between clonal versus subclonal altera- 
tions in a cross-sectional analysis—we derived a temporal map for the 
evolutionary history of CLL. In the context of relapse after first-line 
fludarabine-based therapy, we note highly frequent clonal evolution, 
and that the future evolutionary trajectories were already anticipated in 
the pre-treatment sample in one-third of cases with WES. 

This study provides an indication of the potential benefits to be 
gained by applying novel genomic technologies to growing 
cohort sizes across leukaemias: the continued discovery of novel can- 
didate cancer genes, the deeper integration of genetic analysis with 
standardized clinical information (collected within clinical trials) to 
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inform prognosis and therapy, and the ability to delineate the complex 
network of relationships between cancer drivers in the history and 
progression of the malignant process. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 3 | RNA-seq expression data for candidate CLL genes 
and targeted candidate driver validation. a, Matched RNA-seq and WES 
data were available for 156 CLLs (103 CLLs previously reported’ and 53 CLLs 
from the ICGC studies’). From the WES of these 156 cases, we identified 

318 driver mutations (sSNVs and sIndels). For each site, we quantified the 
number of alternative reads corresponding to the somatic mutation in matched 
RNA-seq data. We subsequently counted the number of instances in which 

a mutation was detected (‘detected’) and compared it to the number of 
instances in which mutation detection had >90% power based on the allelic 


fraction in the WES and the read depth in the RNA-seq data (‘powered’). 
Overall, we detected 78.1% of putative CLL gene mutations at sites that had 
>90% power for detection in RNA-seq data. b, Targeted orthogonal validation 
(Access Array System, Fluidigm) was performed for 71 mutations (sSSNVs 
and sIndels) in putative CLL genes, affecting 47 CLLs from the CLL8 cohort 
(selected on the basis of sample availability). With a mean depth of coverage of 
7,472X, 65 of the 71 mutations (91.55%) validated, with a higher variant 
allele fraction compared with normal sample DNA (binomial P< 0.01). 
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Extended Data Figure 4 | Gene mutation maps for candidate CLL genes. a-v, Individual gene mutation maps are shown for all newly identified candidate CLL 
cancer genes not included in Fig. 2. The plots show mutation subtype (for example, missense, nonsense) and position along the gene. 
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Extended Data Figure 5 | CLL copy number profiles. Copy number profile across 538 CLLs detected from WES data from primary samples 


(see Supplementary Methods). 
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Extended Data Figure 7 | Mutation spectrum analysis, clonal versus 
subclonal sSNVs. The spectrum of mutation is shown for the clonal and 
subclonal subsets of coding somatic sSNVs across WES of 538 samples. The 
rate is calculated by dividing the number of trinucleotides with the specified 
sSNVs by the covered territory containing the specified trinucleotide. 

Both clonal and subclonal sSNVs were similarly dominated by C > T 


transitions at CpG sites. Thus, this mutational process that was previously 
associated with ageing’ not only predates oncogenic transformation (since 
clonal mutations will be highly enriched in mutations that precede the 


malignant transformation*’) but also is the dominant mechanism of malignant 
diversification after transformation in CLL. 
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Extended Data Figure 8 | The CLL driver landscape in the CLL8 cohort. (columns). Recurrent somatic CNA labels are listed in blue, candidate CLL 
Somatic mutation information shown across the 55 candidate CLL cancer cancer genes are listed in bold if previously identified in Landau et al.’, and with 
genes and recurrent somatic CNAs (rows) for 278 CLL samples collected from _ an asterisk if newly identified in the current study. 

patients enrolled on the CLL8 clinical trial primary that underwent WES 
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Extended Data Figure 9 | CLL8 patient cohort clinical outcome (from 278 
patients) information by CLL cancer gene. Kaplan-Meier analysis (with 
logrank P values) for putative drivers not associated with significant impact on 
progression-free survival (PFS) or overall survival (OS) in the cohort of 278 


patients that were treated as part of the CLL8 trial. For candidate CLL genes 
tested here for the first time regarding impact on outcome, a Bonferroni P value 
is also shown. 
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Extended Data Figure 10 | Comparison of pre-treatment and relapse cancer 
cell fraction (CCF) for non-silent mutations in candidate CLL genes 
across 59 CLLs. For each CLL gene mutated across the 59 CLLs that were 
sampled longitudinally, the modal CCF is compared between the pre-treatment 


and relapse samples. CCF increases (red), decreases (blue) or stable CCF (grey) 
over time are shown (in addition to CLL genes shown in Fig. 5). A signifi- 
cant change in CCF over time (red or blue) was determined if the 95% CI of 
the CCF in the pre-treatment and relapse samples did not overlap. 
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Single cell activity reveals direct electron 


transfer in methanotrophic consortia 


Shawn E. McGlynn'}*, Grayson L. Chadwick!*, Christopher P. Kempes”** & Victoria J. Orphan! 


Multicellular assemblages of microorganisms are ubiquitous in nature, and the proximity afforded by aggregation is 
thought to permit intercellular metabolic coupling that can accommodate otherwise unfavourable reactions. Consortia 
of methane- oxidizing archaea and sulphate-reducing bacteria are a well-known environmental example of microbial 
co-aggregation; however, the coupling mechanisms between these paired organisms is not well understood, despite 
the attention given them because of the global significance of anaerobic methane oxidation. Here we examined the 
influence of interspecies spatial positioning as it relates to biosynthetic activity within structurally diverse uncultured 
methane-oxidizing consortia by measuring stable isotope incorporation for individual archaeal and bacterial cells to 
constrain their potential metabolic interactions. In contrast to conventional models of syntrophy based on the passage of 
molecular intermediates, cellular activities were found to be independent of both species intermixing and distance 
between syntrophic partners within consortia. A generalized model of electric conductivity between co-associated 
archaea and bacteria best fit the empirical data. Combined with the detection of large multi-haem cytochromes in the 
genomes of methanotrophic archaea and the demonstration of redox-dependent staining of the matrix between cells in 


consortia, these results provide evidence for syntrophic coupling through direct electron transfer. 


Ecological processes are fundamentally spatial in nature: those gov- 
erning microbial organisms are no exception. The ubiquity and 
impact of biofilms, consortia, and other multicellular assemblages 
in the fields of environmental microbiology, industry, and medicine 
demonstrates the necessity of relating the spatial position of cells to 
metabolic activity and community function’”’. Theoretical modelling* 
and laboratory experiments with artificial co-cultures*’ have offered 
fundamental insights regarding the effect of spatial architecture on the 
fitness and physiology of interacting populations, but studying the 
influence of spatial organization on uncultured microorganisms has 
remained a long-standing challenge. To translate information learned 
from modelling and derived laboratory results to systems found in 
nature requires new methodological strategies that are capable of elu- 
cidating microbial structure-activity relationships. 

Here, fluorescence in situ hybridization and nanoscale secondary 
ion mass spectrometry (FISH-nanoSIMS) combined with 1SN stable 
isotope probing was used to investigate how single-cell metabolic 
activity is related to cellular configuration in highly structured, 
bi-species microbial consortia in environmental samples (Fig. 1 and 
Extended Data Fig. 1). We applied these methods to empirically test 
long-standing hypotheses regarding the metabolic interactions 
underpinning the environmentally important microbial symbiosis 
responsible for the anaerobic oxidation of methane (AOM) in ocean 
sediments’”"'’. Discovered over a decade ago””, these consortia consist 
of multiple lineages of as yet uncultured anaerobic methanotrophic 
archaea (ANME) and sulfate-reducing Deltaproteobacteria (SRB), 
and form diverse aggregate configurations within methane seep sedi- 
ments worldwide**’. Initial FISH-SIMS acquired whole aggregate 
stable isotope depth profiles offered isotopic evidence for the involve- 
ment of ANME-SRB consortia in anaerobic methanotrophy" and 
documented broad anabolic activity patterns amongst different 
AOM aggregate morphologies’®, but key questions regarding the 


mechanism of this syntrophic association remain. There are a number 
of hypotheses regarding the metabolic interactions underlying this 
enigmatic methane-fueled symbiosis, ranging from classical syntro- 
phy based on hydrogen, formate or acetate’”'’, to less conventional 
forms of metabolite or reducing equivalent exchange (for example, 
methanethiol and disulfide)'*’°. Understanding whether there is a 
universal mechanism controlling ANME-SRB mediated methane 
oxidation or if different archaeal—-bacterial AOM consortia use a vari- 
ety of syntrophic strategies, is still an unresolved question in the field. 

Independent of the specific mechanism, a key prediction regarding 
syntrophic associations of microbes is that the spatial arrangement of 
paired organisms can greatly influence the metabolic activity of indi- 
vidual cells'*’*?!*, In these cases, homogeneous species mixing is 
expected to facilitate efficient transfer of diffusible intermediates 
and lead to enhanced metabolic activity’*”’. Similarly, at the single- 
cell level, syntrophic partners in immediate proximity to one another 
are expected to gain a greater metabolic benefit in comparison to cells 
that lack a syntrophic interface. For the AOM system, these predic- 
tions are captured in previously published models'*”* and also in 
examples presented here (Extended Data Fig. 2) which are based on 
syntrophic transfer” or commensal passage” of a diffusible inter- 
mediate. Notably, these modelling results are at odds with the fre- 
quent documentation of large environmental consortia with spatially 
segregated ANME and SRB cells and it remains unclear how these 
configurationally segregated consortia persist, and often dominate, if 
they are at a disadvantage to those that are well mixed’®’®. The 
discrepancies between in situ observations and model predictions 
motivate a series of hypotheses that are testable with single-cell bio- 
synthetic activity measurements of AOM consortia: (1) within highly 
segregated consortia, the vast majority of activity will be restricted to 
cells at interfaces between syntrophic partners, and (2) that segregated 
syntrophic consortia will have lower total activity levels on average 
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Figure 1 | Examples of AOM consortia identified by FISH and paired 
anabolic activity measurement via nanoSIMS. a, FISH-identified consortia 
showing archaeal cells (green) and Deltaproteobacteria (pink). The top two 
panels represent consortia of ANME-2c or 2b paired with Deltaproteobacteria. 
The lower four panels show ANME-2c archaea paired with the seep-specific 
deltaproteobacterial group, SEEP-SRB1a. Scale bars, 3 um. b, Corresponding 
nanoSIMS ion images of biomass show '*N’*C” ion images with warmer 
colours indicating higher secondary ion counts (maximum 1,500 counts). 

c, Single cell activities are measured as '°N atom percentages for regions of 
interest (ROI) representing the FISH-identified archaea and bacteria in each 
consortium. Lighter shaded cells are more enriched in IN, which corresponds 
with higher levels of anabolic activity and "NH," assimilation. Representative 
aggregates were chosen from the larger data set composed of 62 aggregates. 


relative to well-mixed consortia. If these patterns of activity are not 
observed, then the interactions driving the symbiosis may be distinct 
from the classical view of syntrophy occurring through the exchange 
of a diffusible chemical intermediate. 

These hypotheses were evaluated for phylogenetically diverse 
ANME-2 archaea (belonging to the order Methanosarcinales) and 
partner Deltaproteobacteria using high-resolution biosynthetic activ- 
ity measurements paired with FISH-based microbial identification; 
giving us the ability to catalogue cell activity, phylogeny, and cellular 
position within consortia. 1N-ammonium assimilation, used as a 
marker for biosynthetic activity'***, was determined for 5,453 FISH- 
identified cells within 62 consortia from a deep-sea sediment incuba- 
tion showing AOM activity, allowing an assessment of the biosynthetic 
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Figure 2 | Activity relationships between archaea and bacteria in AOM 
consortia. a, Population-level average of bacterial activity versus archaeal 
activity for individual AOM consortia revealing a positive correlation in activity 
between paired partners. Individual AS aggregates shown in blue (n = 41, 

R? = 0.47), AD in green (n = 21, R* = 0.62). The 1:1 line is shown in black. 
b, Total consortium activity plotted against the join index (J) of spatial 
intermixing between archaeal and bacterial partners, where lower J values 
represent greater mixing (AS in blue: n = 41, R* = 0.03; AD in green: n = 21, 
R? = 0.00).c¢,d, Activity of single archaeal or bacterial cells, respectively, plotted 
against distance to the nearest syntrophic partner for the AS data set 

(c, n= 1,967, R? = 0.02; d, n = 2,067, R? = 0.02). The equivalent analysis for 
AD is provided in Extended Data Fig. 6. In all plots R? values and solid lines 
represent linear regressions of the plotted data. Dashed lines illustrate the 95% 
confidence intervals in slopes and intercepts of the linear regressions. 


activity of each cell as it relates to spatial positioning of adjacent 
syntrophic partners (Fig. 1). Two different groups of co-occurring 
AOM consortia in the incubation were analysed: 41 archaeal aggregates 
paired with the specific Desulfobacteraceae lineage SEEP-SRB1a which 
have been observed as a common bacterial partner of ANME-2 world- 
wide” (AS; ANME-2c: SEEP-SRB1a), and 21 archaeal aggregates paired 
with other, non-SEEP-SRBla Deltaproteobacteria (AD; ANME-2c or 
ANME-2b: Deltaproteobacteria); (Fig. 1, Extended Data Fig. 3 and 
Supplementary Information). 


Spatial patterns of cellular activity 


FISH analyses of AS and AD aggregates revealed nearly equal abun- 
dances of archaea and bacterial cells within each consortia (Extended 
Data Fig. 3). The average activity (‘°N enrichment) of archaeal and 
bacterial populations from each consortia was found to be correlated 
and close to the 1:1 line suggesting a beneficial metabolic interaction 
(Fig. 2a), although the AS bacteria were on average slightly more active 
than their archaeal partners (Extended Data Fig. 4). The AS and AD 
consortia analysed in this study occupied a considerable range of both 
biosynthetic activity and cell number (Extended Data Fig. 3). In order to 
relate this range of activity to the amount of partner intermixing, we 
developed a quantitative metric, J, to describe aggregate spatial mixing 
(Extended Data Fig. 5 and Supplementary Discussion). J values for 
aggregates represented a range of mixing, and permutation tests revealed 
that the majority of AS and AD consortia exist in conformations where 
partnering cells were more segregated than could be explained by 
random chance, consistent with patterns emerging from binary cell 
division (Extended Data Fig. 5). Across this conformational variability 
however, the degree of mixing between cell types (J values between 0.9 
and 4.8) did not influence the average biosynthetic activity of the entire 
consortium (Fig. 2b), indicating the overall activity of well-mixed con- 
sortia (low J values) was not greater than those that were segregated. 
When examining the '°N-based activity patterns of individual 
archaeal and bacterial cells within a consortium, each cell’s activity 
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was found to be unrelated to proximity to the nearest archaeal or 
bacterial partner (Fig. 2c, d and Extended Data Fig. 6). Similarly, in 
the vast majority of consortia, the cells located at syntrophic interfaces 
were not significantly more active than those surrounded by the same 
cell type (Supplementary Tables 1-4, and Extended Data Fig. 7). 
These single cell measurements are consistent with previously pub- 
lished results of intact AOM consortia, where a correlation between 
bulk °N enrichment and natural abundance 5'°C (a proxy for metha- 
notrophic ANME biomass) in SIMS depth profiles was not detected"®. 
We also examined whether archaeal or bacterial cellular activity was 
related to the external environment, however no significant correla- 
tion with distance to the aggregate-environment interface was 
observed (Extended Data Fig. 8). These observations are contrary to 
conventional diffusive model results for syntrophic AOM partners’*”* 
in scenarios where one or both partners are dependent on the sur- 
rounding environment (Extended Data Fig. 2 shows diffusion model 
results for a broad range of diffusion and activity rates). Other poten- 
tial relationships between total consortia activity and aggregate size 
approximated from total cell number, as well as the relative ratio of 
ANME and SRB cells in each consortium were also not significant in 
our data set (R* values <0.04). 


Diffusion versus direct electron transfer 


The distance-independent trends in cellular activity presented in 
Fig. 2b-d, Extended Data Fig. 6, and Supplementary Tables 1-4 are 
in stark contrast to what is predicted in the case of syntrophic 
exchange’*”’ or commensal sharing” of a diffusible intermediate 
(see diffusion model results in Extended Data Fig. 2 and Supple- 
mentary Information). To explore diffusion-independent scenarios 
which might explain our empirical data, we constructed a second 
generalized model that captures the basic features of direct interspe- 
cies electron transfer**. This model is based on electron export by one 
cell type and electron import by a partner within the consortia, where 
electrons are able to freely flow across the entire aggregate with a 
dependence on electric potential (Supplementary Information). 
Consistent with our data, these models predict a reduction in the 
overall correlation between biosynthetic activity and aggregate geo- 
metry (J metric), especially as the electric conductivity of the aggregate 
is increased relative to the growth rates (Extended Data Fig. 9). In 
particular, total aggregate activity is relatively insensitive to how well 
mixed the aggregate is, and single-cell activities are less correlated 
with the distance to either the aggregate surface or the syntrophic 
partner (Extended Data Fig. 9). In both models (Extended Data 
Figs 2 and 9), the empirical results presented in Fig. 2 are best matched 
with an increased ratio of metabolic exchange rates relative to cellular 
activity rates and both models converge to similar results for high 
rates of transport. For a diffusible intermediate, this ratio would need 
to be larger than predictions from known intermediate diffusivities'*, 
our observed growth rates, and the expected growth yields. 

These model results related to the J metric illustrate key geometric 
and mechanistic differences beyond those stemming from spatial 
gradients. We find that in the low transport regimes for both the 
diffusive and electric conductivity models, the total activity of con- 
sortia is strongly related to the overall mixing between the two part- 
ners, but with opposite and somewhat unanticipated outcomes. As 
predicted, the slow relative transport diffusion scenario shows the 
highest activity associated with well-mixed consortia (low J values), 
however, the electron conductivity model indicates higher levels of 
activity in more segregated consortia (high J values); (Extended Data 
Fig. 9a). This prediction arises because our conductive treatment of 
the consortia relies on the global electric potential for each consortia, 
which is strongest when the electron producing and consuming cells 
are spatially segregated, maximizing polar charge separation. It 
should also be noted that mechanisms of electron diffusion””** would 
produce relative transport rates sufficient for matching the observed 
equable activity patterns (Supplementary Discussion). 
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Multi-haem cytochrome genes in ANME-2 genomes 


Motivated by these modelling results in which direct electron transfer 
apparently relaxes spatial controls on aggregate activity in agreement 
with our single cell observations, we analysed available ANME-2 gen- 
omes to determine whether there were signs of this alternative mode of 
syntrophy as has been previously suggested for ANME-1 (ref. 29). 
Remarkably, the genomes of two recently sequenced methanotrophic 
archaea (ANME-2a*° and ANME-2d"?), as well as a reconstructed 
metagenomic bin corresponding to the ANME-2b (data not shown) 
were each found to encode large multi-haem cytochromes (MHCs), 
including the largest described from an archaeon to date (34 haems); 
(Fig. 3a, b). A subset of these previously overlooked MHCs occur fused 
with a single putative S-layer domain which appears to be a homologue 
of the S-layer protein in Methanosarcina acetivorans (Fig. 3a), suggest- 
ing MHC export from the cell and incorporation into the archaeal 
S-layer. The occurrence of MHCs of this size encoded in a genome is 
rare even in bacteria, and those that do occur are almost exclusively 
found in organisms known to conduct extracellular electron transfer 
such as Geobacter and Shewanella — species which serve as model 
organisms for the process**??™*. 


Cytochrome reactive staining in consortia 


To test possibility that MHCs are positioned between cells within con- 
sortia as electron conduits, the cytochrome reactive histochemical stain 
3,3'-diaminobenzidine (DAB)** was applied to AOM consortia recov- 
ered directly from sediment. Treatment with DAB and HO, followed 
by post fixation with OsO, resulted in the staining of: (1) the cellular 
membranes of both syntrophic partners, (2) some intra-cellular mem- 
brane invaginations of paired Deltaproteobacteria, and (3) the extra- 
cellular space between cells within consortia (Fig. 3c, d). DAB staining 
with the addition of HO was observed in many, but not all aggregates 
in the preparation, suggesting the possibility of phylogenetic or pheno- 
typic variation in extracellular MHC production within the sediment- 
hosted AOM consortia. No visible staining was observed in control 
experiments without HO, (Fig. 3e, f). As DAB is known to react with 
redox active transition metal ions (including those bound by haem 
groups within cytochromes) in the presence of H20,”°, these results 
are consistent with the localization of the respiratory chain in the cel- 
lular membrane for each organism, and also with the presence of haem 
proteins capable of redox activity in the space between cells in consortia. 


A model for direct electron transfer 


The electron microscopy results reported above, together with the 
presence of the large MHCs in all available ANME-2 genomes sug- 
gests that extracellular electron transfer may be an important feature 
of the anaerobic methanotroph lifestyle. Based on this finding, and the 
lack of genomic evidence presented for other syntrophic models”, we 
propose the catabolic model for AOM coupled to extracellular elec- 
tron transfer depicted in Fig. 4. Using known biochemical coupling 
mechanisms in methanogens* 7, the oxidation of one mole of methane 
can result in four moles of reduced methanophenazine. We propose 
that these methanophenazines are oxidized by an integral membrane 
protein (for example, cytochrome b), with electrons being transferred 
onto an initial MHC for transport from the membrane to the S-layer. 
Tandem proteins 2566125773 and 2566125774 in the ANME-2a gen- 
ome encode a predicted formate dehydrogenase-related cytochrome b 
and an 11-haem multi-haem cytochrome, which are possible candi- 
dates for this methanophenazine:cytochrome c oxidoreductase step. 
The MHC/S-layer fusion proteins depicted in Fig. 3a could then be 
used for electron transport across the S-layer. Finally, large extracel- 
lular cytochromes such as the 31 CxxCH motif containing protein 
2566123495 and numerous other small MHC proteins could be used 
to confer electrical conductivity to the exopolymer matrix between the 
ANME-2 archaea and their SRB partners, similar to the case of MHC 
proteins thought to facilitate growth of thick geobacter biofilms”*. As 
formulated here, this proposed metabolic pathway could potentially 
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Figure 3 | Multi-haem cytochrome genes, genomes and TEM visualization 
of haem group reactivity in representative ANME-SRB consortia. a, Five 
MHCs with predicted S-layer domains from reconstructed ANME-2 genomes. 
Cytochrome c binding motif sites (CxxCH) are indicated by red dots, 
alternative binding motifs (CxxxCH and CxxxxCH) are shown in green. 
Putative S-layer domains and potential PGF-pre-PGF archaeosortase 
recognition domains as predicted by the NCBI conserved domain database are 
shown in blue and purple, respectively. Protein schematics are simply 
positioned to show compositional similarity, not sequence alignment. Each 
vertical tick-mark denotes 250 amino acids. The gene identifier numbers are 


result in the net translocation of ~2H* per CH, oxidized by the 
ANME-2 archaea, with some uncertainty due to the exact stoichi- 
ometry of the proton and sodium pumping complexes involved. This 
low proton efflux per substrate used fits well with the small thermodyn- 
amic free energies associated with anaerobic oxidation of methane” 
and the slow growth rates of these organisms. 

Together, the evidence from our spatially resolved analysis of cellular 
activity, genomic observations, and electron microscopy experiments 
is highly suggestive of direct interspecies electron transfer between 
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Figure 4| A proposal for the energy metabolism of ANME-2a. The 
syntrophic half-reaction for ANME-2a is CH, + 2H,O-> CO, + 8e” + 8H”. 
Reducing equivalents from methane are generated through the methyl branch 
of the Wood-Ljungdahl pathway and are deposited in the membrane-bound 
methanophenazine pool (Mp/MpH,) via reactions at the Hdr, Fpo, and Rnf 
complexes, which oxidize CoM-SH/CoB-SH, F429H2, and Fd,.4, respectively. 
All proteins involved in the reverse methanogenesis pathway (green) have been 
identified in the ANME-2a genome”. We have identified proteins (orange) in 
the ANME-2 genomes that may be responsible for extracellular electron 
transport as outlined in the text. Haem groups are shown schematically as in 
Fig. 3, and the number of haems predicted to be bound by the peptide (CxxCH 
motifs) is indicted. 
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shown on the right. b, Size of largest MHC present in sequenced archaeal 
genomes with representative model bacteria, Geobacter and Shewanella, 
included as a reference. c-f, Transmission electron microscopy (TEM) 
micrographs of sediment-hosted methanotrophic consortia treated with the 
haem-reactive compound 3’-3-diaminobenzadine (DAB). c, d, Positive 
staining of the membranes and extracellular space between archaeal and 
bacterial cells in the presence of H2O3. e, f, Control cells from DAB 
experiments where H,O, was omitted. Scale bars, 500 nm. Arrows mark the 
interfaces of cells. 


methanotrophic ANME-2 and associated Deltaproteobacteria. This 
diffusion-independent mechanism appears to largely obviate the geo- 
metric constraints amongst ANME-SRB consortia. Interspecies electron 
transfer may also contribute to greater stability of the association com- 
pared with syntrophic exchange of a diffusible intermediate, where loss 
to the environment and greater sensitivity to environmental chemical 
fluctuations can limit otherwise favourable thermodynamics. The type of 
interspecies electronic coupling described here and in co-cultured organ- 
isms for example**””, (and see also ref. 26 and references therein), may be 
an underappreciated natural phenomenon that contributes to microbial 
niche construction, where metabolic coupling facilitated by direct elec- 
tron transfer could function as a means of generating stable syntrophic 
microbial assemblages. Additionally, the MHCs in ANME-2 genomes 
may help explain the occurrence of ANME-2 aggregates without syn- 
trophic partners, as well as the observation of AOM with metal oxides, 
where the ANME-2 may be able to grow on their own as metal oxide 
reducers”’, Future work will be required to fully comprehend the detailed 
mechanisms of electron transfer, the role of MHCs, and the potential 
function of interspecies electron transfer among different ANME groups 
or habitats. The culture-independent approach described here is applic- 
able to investigating interactions occurring in a broad range of envir- 
onmental microbial assemblages and may be amenable with other 
stable isotope tracers (for example, deuterated water), where the central 
challenge is to understand how metabolic interconnectivity and spatial 
relationships between organisms drives local and bulk geochemical 
processes. 

Note added in proof: While this manuscript was in review, a paper noting 
the presence of large multi-haem cytochromes in archaea, including the 
ANME-2d genome, was published“’. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | Image processing workflow for single cell d, Overlay of the original FISH image (yellow) and the warped FISH image 
correlation between FISH and nanoSIMS data sets. Representative example _(blue) highlighting a slight offset which becomes significant at single-cell 
of data processing for an AOM consortium. a, Fiducial markers added to the resolutions. e, Centroids of the hand-drawn ROIs displayed on the 


FISH image. Marker points are shown in yellow, bacterial cells in red, nanoSIMS image, bacteria in red, archaea in green. f, Inverse transform applied 
archaeal cells in green. b, Corresponding fiducial markers identified on the to the ROIs drawn on the nanoSIMS image, bringing the centroid coordinates 
nanoSIMS image. c, Overlay of the warped FISH image onto the nanoSIMS into ‘FISH space’ where we have more accurate measurement of distances 
image, the transform function was defined by the points shown in a and b. between points. 
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Extended Data Figure 2 | Spatial and geometric relationships for modelled 
aggregate geometries (well mixed to segregated) as a function of relative 
diffusivity (the ratio of growth rates to growth yields and diffusivity; see 
Supplementary Information) within the intermediate exchange model. 
Slow diffusion is on the left (equivalent to roughly half the relative diffusivity of 
hydrogen compared to measured growth rates in our system) and fast on the 
right (equivalent to 10° times faster relative diffusivity than hydrogen compared 
with measured growth rates; see Supplementary Information). a, Total 
aggregate activity normalized to the group maximum as a function of the J 
spatial metric showing a strong dependency on geometry favouring well mixed 
(low J value) geometries under slow relative diffusion (left) and almost no 
relationship with J in fast-diffusion models (right). The average activity, 
normalized across all of the regimes rather than within a single regime, also 
changes dramatically from 0.002 to 0.99 as the relative diffusivity is increased. b, 
Total normalized archaeal population activity plotted against the total bacterial 


Distance to Surface 


Distance to Surface 


population activity within the same modelled aggregate. The total number of in 
silico consortia for rows a and b is 23. c, The normalized (z-score) activity for 
archaea (red) and bacteria (green) plotted against the distance to the nearest 
three partners. d, The z-score activity for archaea (green) and bacteria (red) 
plotted against the distance to environment-aggregate interface (that is, 
aggregate surface). In plots c and d the r-squared values for each correlation are 
given at the top of each plot in colours that correspond to the two cell types. 
The number of modelled in silico bacterial and archaeal cells from ¢ and d 
plotted in the columns from left to right are: 1,138 bacterial and 1,162 archaeal 
cells; 1,163 bacterial and 1,137 archaeal cells; and 1,153 bacterial and 1,147 
archaeal cells. As diffusion is increased in these models from left to right, the 
organisms within consortia become less dependent on each other and instead 
become less syntrophically coupled, relying on environmental exchange. 

This leads to the highest average activity rates per consortia (compare the top 
panel a to b). 
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Extended Data Figure 3 | Summary of aggregate characteristics. 

a, Histograms displaying the distribution of cell counts per aggregate for AS and 
AD consortia, blue and green respectively. b, Histograms displaying the 
average activity values for the AS and AD consortia, where anabolic activity is 
measured as fractional abundance of !°N per cell. c, Histograms of the number 
of AS and AD consortia associated with different levels of spatial mixing 
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between syntrophic partners represented by the spatial mixing metric ‘J (see 
Supplementary Discussion for details on this metric). d, One-to-one 

relation between bacterial and archaeal cell counts in the AS and AD consortia 
analysed in this study. For all panels, the data set consists of 41 AS and 21 
AD consortia. The number of cells in each aggregate can be found in the 
Source Data. 
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Extended Data Figure 4 | Illustration of the value of single-cell resolution 
activity analysis. a, Box plots showing the full range of archaeal and bacterial 
single-cell activities determined by '*NH,* assimilation. The difference 
between the archaeal and bacterial mean activities across all aggregates (n = 62) 
is not significant (two sample t-test, P > 0.05). b, With our ability to 

quantify the activity for individual phylogenetically identified cells in AOM 
consortia, the average activity of the bacterial and archaeal populations within 
each consortium was revealed. Assessed at the level of paired populations, 

a significant difference in activity between the population of archaea and 
Deltaproteobacteria within aggregates is evident (n = 62, paired-sample t-test, 
P<0.001). ¢, d, Adding phylogenetic resolution to this analysis by 
sub-grouping consortia based on their different deltaproteobacterial partners 
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(AD and AS) reveals the difference between bacteria and archaea is only 
significant in the AS consortia (n = 41, paired-sample t-test, P< 0.001), while 
this population level offset in activity was not statistically supported within the 
AD group (n = 21, paired-sample t-test, P > 0.05), illustrating differential 
patterns in activity related to species membership. All axes represent °N 
fractional abundance. The 8 consortia images shown in panels b-d represent a 
subset of the total 62 consortia included in the analysis, with each image 
coloured by either archaeal ‘°N enrichment on the left (green) or bacterial '"N 
enrichment on the right (pink). The degree of brightness for each cell in the 
image reflects increasing levels of relative cellular °N enrichment and the 
average population value for '°N fractional abundance is provided on 

the central axis. 
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Extended Data Figure 5 | Evaluation of metrics for partner mixing. The 
degree of partner intermixing within an aggregate was calculated using two 
metrics (see Supplementary Information for detailed description of metrics). 
For the modified join metric (J), 1 represents random mixing, while for Moran’s 
I, 0 represents random mixing. For both metrics increasing positive values 
represent increasing partner segregation and increasing negative values 
represent increasing ordered mixing (like a checkerboard). a, Examples of 
mock aggregates which were used to verify the behaviour of the two metrics. 
b, c, The determined values for either J or Moran’s I are represented by the large 
coloured data points for each of the 41 AS aggregates or 21 AD aggregates 
analysed in this study, respectively. The black data points represent the values 
for J or Moran’s I that were calculated in 300 permutation tests where the x and 


y coordinates of the archaea and bacteria cells were randomly reassigned. 
When the observed mixing was more segregated than 95% of the random 
permutation tests, the data points were coloured green and considered 
significant. Similarly, when the observed mixing was found to be more orderly 
mixed than 95% of the permutation tests the data points were coloured purple. 
When the observed mixing was found to be less extreme in either direction 
than 95% of the random test aggregates the data points were coloured red. The 
two metrics, while different in their formulation, gave very similar results. It 
is noteworthy that only a single aggregate contained cells that were more 
mixed than random. As expected, the permutation tests hover around the 
random mixing values for each metric, 1 and 0 for J and J, respectively. 
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Extended Data Figure 6 | Insensitivity of cell activities to distance from 
nearest syntrophic partner for AD consortia. Plots displaying all ROIs 
analysed of a given type for consortia composed of ANME-2b or ANME-2c and 
Deltaproteobacteria. Normalized activity (Z-scores) were calculated within 
each aggregate to allow for comparisons between consortia with large 
differences in average cellular activity. a, Normalized activities of archaea 

(n = 765 cells) within AD consortia as a function of distance to nearest 


ARTICLE 


oy 


Normalized Bacteria Activity 


@ R*=0.012 || 


ome) 
3 4 5 6 7 
Distance to Nearest Archaea (um) 


syntrophic partner. b, Normalized activities of bacteria (n = 658 cells) within 
AD consortia as a function of distance to nearest syntrophic partner. From 
this analysis, it appears that distance to nearest syntrophic partner does not 
account for a significant amount of the variation in cellular activity within a 
consortium. The R” values for linear regressions on the plotted data are shown 
in each panel. Dashed lines illustrate the 95% confidence intervals in slopes and 
intercepts of the linear regressions. 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


K 
G 


was 
My 
K 
Dy 
BS II 


LS 
DOKNED 

PIX 
CASS% 


Extended Data Figure 7 | Schematic of network analysis for microbial affiliation. c, Spheres of influence network of the consortia showing 
consortia. a, FISH image of a representative ANME (green) and SRB (pink) —_ connectivity between cells. d, Identification of cells that share a border with a 
consortium. b, Highlighted regions of interest false coloured by phylogenetic —_syntrophic partner (archaea adjacent to bacteria). 
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Extended Data Figure 8 | Insensitivity of cell activities to distance from 
surface. Plots displaying all ROIs analysed for a given population. Normalized 
activities (Z-scores) were taken within each consortium to allow for 
comparisons between aggregates with large differences in average cellular 
activity. a, Normalized activities of archaea within AS aggregates (n = 1,967 
cells) as a function of distance to aggregate surface (that is, the external 
environment). b, Normalized activities of bacteria (n = 2,063 cells) within AS 
aggregates as a function of distance to aggregate surface. c, Normalized 
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activities of archaea (n = 765 cells) within AD aggregates as a function of 
distance to aggregate surface. d, Normalized activities of bacteria (n = 658 cells) 
within AD aggregates as a function of distance to aggregate surface. From 
this analysis, the distance to the surface of the aggregate does not appear to 
explain a significant amount of the variation in cellular activity within each 
consortium. The R” values for linear regressions on the plotted data are shown 
in each panel. Dashed lines illustrate the 95% confidence intervals in slopes 
and intercepts of the linear regressions. 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Relative Conductivity 


Se 


= 100 e = 100 e = 100 e 
S a x 8 oe s Fs @ eee 0% ° 
e e e e 
= 80 e” jie = 80 e° = 80 ¢ #% 
3 : EY of 3 oe ot 
<= 60 a ts <= 60 ° <= 60 ° 
& e & ee & 
> > eof e > 
€ 40 € 40 e € 40 
b b a] i 
<= 20 <x 20 <x 20 
g g g 
ef 0 ef 0 ef o 
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 
J J J 
100 
x % 90 x 
= = = 
D ao 80 a 
z z z 
8 = 70 £ 
2 2 60 2 
2 2 2 
g ZB 50t Zz 
- < 40 aa = 
50 60 70 80 90 100 40 50 60 70 80 90 100 60 70 80 90 100 
B Activity (% of Agg. Max.) B Activity (% of Agg. Max.) B Activity (% of Agg. Max.) 
R7=0.019 , R?=0.115 R?=0.075 , R?=0.192 R?=0.048 , R?=0.069 
6 6 6 
ra Cc ry ry 
= 4 = 4 = 4 
io] io] b 
< <5 < 
mo} mo} mo} 
Q Q N 
oO oO oO 
E E E 
ie) 6 -2 fe) 
Zz Zz Zz 
fe) @ -4 @ =4 : 
“4 “4 “4 
-6 -6 -6 
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 
Distance to Nearest Partner Distance to Nearest Partner Distance to Nearest Partner 
R2=0.194, R?=0,306 R?=0.029 , R?=0.119 R?=0.001 , R?=0.009 
6 6 6 
> d Fy > 
2 4 = 4 2 4 
3s , G i) ' 
<5 : <5 <5 : 
N SS —— : & SS : : = ; 
= 0 a : a oe = ; a | i ; 
ES H pF i ! ! , = 4 i 1 ! ' 
2 2a a oe eee fo 4 e 2 2 . 4 
65 4 : : : 5-4 : : ; ; a 4 : i : 
a a [aa 
-6 -6 -6 
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 


Distance to Surface 


Extended Data Figure 9 | Spatial and geometric relationships for all 
modelled aggregate geometries as a function of relative conductivity within 
the direct electron transfer model. a, Total aggregate activity normalized to 
the group maximum as a function of the J spatial metric, from well-mixed 
(low J) to segregated (high J) aggregate geometries (23 in silico aggregates in 
total). These plots illustrate how the total activity of all of aggregate geometries 
changes with the relative conductivity, with less dependency on geometry 
observed at the fastest conductance rates. Compare to Extended Data Fig. 2: in 
the case of electron exchange presented here, the least mixed aggregates (high J) 
have the highest activity. This is because our conductive treatment of the 
aggregate relies on the global electric potential of each consortia, which is the 
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strongest when the cells are spatially organized. b, Normalized archaeal activity 
plotted against the normalized bacterial activity within the same modelled 
aggregate. c, The normalized (z-score) activity for archaea (green) and bacteria 
(red) plotted against the distance to the nearest three partners. d, The z-score 
activity for archaea (green) and bacteria (red) plotted against the distance to 
environment-aggregate interface (aggregate surface). In plots c and d the 
r-squared values for each correlation are given at the top of each plot in colours 
that correspond to the two cell types. The number of modelled in silico 
bacterial and archaeal cells from c and d plotted in the columns from left to 
right are: 1,138 bacterial and 1,162 archaeal cells; 1,161 bacterial cells and 1,139 
archaeal cells; and 1,134 bacterial and 1,166 archaeal cells. 
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Structural basis for gene regulation by a 
Bj»-dependent photoreceptor 


Marco Jost', Jésus Fernandez-Zapata‘, Maria Carmen Polanco”, Juan Manuel Ortiz-Guerrero”+, Percival Yang-Ting Chen!, 
Gyunghoon Kang!, S. Padmanabhan’, Montserrat Elias-Arnanz” & Catherine L. Drennan) 


Photoreceptor proteins enable organisms to sense and respond to light. The newly discovered CarH-type photoreceptors 
use a vitamin By derivative, adenosylcobalamin, as the light-sensing chromophore to mediate light-dependent gene 
regulation. Here we present crystal structures of Thermus thermophilus CarH in all three relevant states: in the dark, 
both free and bound to operator DNA, and after light exposure. These structures provide visualizations of how 
adenosylcobalamin mediates CarH tetramer formation in the dark, how this tetramer binds to the promoter —35 
element to repress transcription, and how light exposure leads to a large-scale conformational change that activates 
transcription. In addition to the remarkable functional repurposing of adenosylcobalamin from an enzyme cofactor to a 
light sensor, we find that nature also repurposed two independent protein modules in assembling CarH. These results 
expand the biological role of vitamin B,, and provide fundamental insight into a new mode of light-dependent gene 


regulation. 


Light allows for photosynthesis and other essential light-dependent 
chemical reactions. Light also triggers photo-oxidative stress via gen- 
eration of reactive oxygen species, which rapidly damage the cell’. 
Organisms in all domains of life produce proteins capable of sensing 
light. These biological photoreceptors participate in vision, harness 
light energy, regulate circadian clocks, and mediate gene expression 
and major developmental processes”. Different classes of photorecep- 
tor proteins with a variety of chromophore cofactors exist to sense light 
over the visible and ultraviolet spectrum®’. This group of photorecep- 
tors was recently expanded by a new class that is widespread in bacteria 
and uses the vitamin B,, derivative adenosylcobalamin (AdoCbl) as the 
chromophore to orchestrate light perception and response*”. The pro- 
totypes of this class, the CarH-type photoreceptors, have been charac- 
terized in Myxococcus xanthus*” and T. thermophilus'® and typically 
regulate light-induced expression of carotenoid biosynthetic genes, 
which results in carotenoid-mediated protection against photo-oxid- 
ative damage’’"’. This class of photoreceptors thereby allows bacteria 
to mitigate the detrimental effects of sunlight, a critical determinant of 
survival in light-exposed conditions, while avoiding unnecessary 
production of carotenoids in the absence of light. 

The CarH-type photoreceptors consist of an amino (N)-terminal 
DNA-binding domain and a carboxy (C)-terminal AdoCbl-binding 
and oligomerization domain to directly sense light and regulate gene 
expression. In the dark, AdoCbl-bound CarH, a tetramer, binds to the 
promoter region of target genes to repress transcription. Exposure to 
blue, green, or ultraviolet light disrupts the photosensitive Co-C bond 
in AdoCbl, leading to tetramer disassembly, loss of operator-binding, 
and activation of gene expression (Fig. 1)*§. Thus, AdoCbl, which is 
typically used as a cofactor for radical-based enzyme reactions'*™, is 
now being used for a new biological function as a light sensor. Here, 
we sought to determine the structural basis for the functional repur- 
posing of one of nature’s most complex metallocofactors and for this 
new mode of light-dependent gene regulation. 


CarH has a modular architecture 


To visualize the CarH domain structure and the architecture of the 
repressor ‘dark’ state, we first determined two independent structures 
of AdoCbl-bound CarH from T. thermophilus to 2.15 A and 2.80 A 
resolution (Extended Data Table 1). To prevent cleavage of the light- 
sensitive AdoCbl Co-C bond, we performed all crystallization experi- 
ments under red light and all diffraction data collection at T = 100 K. 
Both single-crystal ultraviolet—visible (UV-vis) spectra and the elec- 
tron density confirmed that AdoCbl remained intact with a Co—C 
bond length of 2.0 A (Extended Data Fig. 1), indicating that the struc- 
tures are in the ‘dark’ state. Each monomer of CarH has a modular 
three-domain architecture with an N-terminal winged-helix DNA- 
binding domain followed by the light-sensing domain, composed of a 
four-helix bundle and a C-terminal Rossmann-fold cobalamin (Cbl)- 
binding domain (Fig. 2a). 

The DNA-binding domain is structurally similar to those of MerR 
family transcription factors’* (Fig. 2b) and to that of the AdoCbl- 
independent CarH paralogue CarA*"®, featuring a canonical recog- 
nition helix and a B-hairpin wing for DNA binding (Fig. 2a). In our 
DNA-free structures, the DNA-binding domain samples different 
orientations for the different CarH protomers in the asymmetric unit, 
enabled by a flexible linker region and stabilized by crystal lattice 
contacts (Extended Data Fig. 2). 

In contrast to the flexible DNA-binding domains, the four-helix 
bundle and the Rossmann domain are structurally rigid, together 
forming a module that binds the AdoCbl light sensor. AdoCbl is 
sandwiched between the four-helix bundle, which interacts with the 
upper axial 5'-deoxyadenosyl (5'-dAdo) ligand, and the Rossmann- 
fold domain, which binds the Cbl lower face in the base-off/His-on 
mode with the side chain of His177 displacing the Cbl dimethylben- 
zimidazole base (Fig. 2a, d and Extended Data Fig. 1b). Instead of 
closely resembling an AdoCbl-dependent enzyme, the CarH light- 
sensing domain is structurally homologous to the methylcobalamin 
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Figure 1 | Schematic of CarH-mediated light-dependent gene regulation. 
Structures of all three relevant states are reported herein. Red circles depict 
AdoCbl, filled red semicircles photolysed Cbl, and open red semicircles 
4',5'- anhydroadenosine, the recently identified photolysis product of CarH- 
bound AdoCbl’’. See main text for details. 


(MeCbl)-binding module of methionine synthase MetH (Fig. 2b)’, 
even though the AdoCbl 5'-dAdo group is much bulkier than the 
MeCbl methyl group in MetH. Modelling AdoCbl into MetH leads 
to several steric clashes (Fig. 2c), but in CarH, a small but important 
shift of 2.5 A of the four-helix bundle enlarges the cavity on the Cbl 
upper face (Fig. 2d, e). Additionally, four hydrophobic residues at the 
Chl upper face in MetH are replaced in CarH, providing the 5’-dAdo 
group a larger binding pocket (Leu715—>Val138), a hydrogen bond- 
ing interaction (Val718—>Glul141), and more polar environment in 
general (Val719—>His142, Phe708—>Trp131) (Fig. 2d, f). Although 
Trp is larger than Phe, the orientation of the Trp side chain on the 
side of the upper ligand rather than directly above perfectly accom- 
modates the 5'-dAdo group (Fig. 2c, d). Notably, Trp131, Glu141, and 
His142 are highly conserved in CarH homologues, suggesting that 
this mode of AdoCbl binding is conserved as well (Extended Data 
Fig. 3). 


CarH is a dimer-of-dimers type tetramer 


AdoCbl-bound CarH is a tetramer in the crystal structure (Fig. 3), 
consistent with results from size-exclusion chromatography (SEC) 
and analytical ultracentrifugation*’*. Four light-sensing domains 
comprise the core of the tetramer (Fig. 3a-d) with the DNA-binding 
domains extending outward (Fig. 3e, f). The core has a dimer-of- 
dimers architecture, with each constituent dimer composed of two 
CarH light-sensing domains in a head-to-tail orientation (Fig. 3a). 
The extensive head-to-tail dimer interface is formed by the four-helix 
bundles and the Cbl-binding domains and features a solvent-buried 
area of 1,430 A* on each protomer as well as numerous hydrogen 
bonds and ionic interactions involving various side chains and the 
5'-dAdo group of AdoCbl (Fig. 3a, b). Two such head-to-tail dimers 
assemble to a tetramer in a staggered fashion (Fig. 3c-f). This dimer- 
dimer interface is again formed byt the light-sensing domains, creating 
a buried surface area of 1,590 A* on each of the two head-to-tail 
dimers, whereas the four DNA-binding domains are positioned on 


f vr. 
E. coli MetH 


Figure 2 | Structure of CarH protomer and comparison with MetH. a, CarH 
protomer coloured by domain: N-terminal DNA-binding domain (cyan) 
with recognition helix (dark blue) and B-hairpin wing (purple) highlighted; 
central four-helix bundle (yellow); C-terminal Cbl-binding domain (green). 
AdoCbl shown with Cbl carbons in pink, 5’-dAdo group carbons in cyan, 
cobalt in purple. b, Overlay of CarH protomer with Cbl-binding module of 
MetH (grey, PDB accession number 1BMT’’) and BmrR DNA-binding 
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the surface of the tetramer and only make minor contributions to the 
interface (Fig. 3e, f). It was previously demonstrated by SEC that CarH 
lacking the DNA-binding domain still forms tetramers*, and here we 
find that CarH adopts the same tetramer architecture when the DNA- 
binding domains are proteolytically removed during crystallization 
(Fig. 3c, e). Thus, the light-sensing domains appear to mediate tetra- 
merization, positioning the DNA-binding domains on the surface to 
engage DNA. 

To confirm the CarH tetramer architecture and the mode of AdoCbl 
binding, we mutated residues in the AdoCbl binding site, at the head-to- 
tail dimer interface, and at the dimer-dimer interface and analysed these 
mutants using SEC and electrophoretic mobility shift assays (EMSAs) 
(Extended Data Figs 3 and 4). Non-conservative mutations in the bind- 
ing site for the 5’-dAdo group (W131A, E1414; Fig. 2d) impaired tetra- 
mer formation and mutations near the head-to-tail dimer interface 
(H142A, D201R; Fig. 3b) completely abolished it. Remarkably, the most 
drastically adverse H142A and D201R mutations also appeared to impair 
AdoCbl binding (Extended Data Fig. 4a, c; absorbance traces at 522 nm). 
Moreover, DNA binding affinity weakened with decreased ability to 
form tetramers in the W131A, E141A, H142A, and D201R mutants 
(Extended Data Figs 3a and 4g). For comparison, we also introduced 
the conservative W131F mutation, which behaved like wild-type (WT) 
CarH in its oligomerization and DNA binding properties (Extended 
Data Fig. 4a, g). The inability of the D201R mutant protein to oligomerize 
is consistent with the observed Asp201/Arg176 interaction playing an 
important role in stabilizing the head-to-tail dimer interface (Fig. 3a). We 
expected a second compensatory mutation, R176D or R176E, to restore 
the interaction and, indeed, the R176D/D201R and R176E/D201R dou- 
ble mutants could form tetramers (Extended Data Fig. 4c) and bind to 
DNA, albeit less efficiently than WT CarH (Extended Data Fig. 4g). 
Finally, replacing Gly160 and Gly192 at the CarH dimer-—dimer interface 
(Fig. 3d) by the bulkier Gln resulted in dimers in the presence of AdoCbl, 
a form previously never observed for WT CarH (Extended Data Fig. 4e). 
Although both these mutants bound to the DNA probe in the dark, they 
formed a smaller size (higher mobility) complex than WT CarH, sug- 
gesting that their binding mode is distinct (Extended Data Fig. 4g). Both 
mutants furthermore bound DNA with reduced affinity and cooperativ- 
ity compared with WT CarH (Extended Data Fig. 4h). Together, these 
results are consistent with the observed CarH tetramer architecture and 
indicate that this architecture is critical for DNA binding. 


CarH binds the promoter —35 element 

To reveal the mode of transcriptional repression, we next sought to 
visualize CarH bound to its cognate DNA operator. The CarH operator 
lies within a 110-base-pair (bp) segment of the intergenic region 
between carH and the carotenogenic crtB*"°. Using systematically trun- 
cated DNA probes in EMSAs (Extended Data Fig. 5a, b), this operator 
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VGDLEGEGKMFLPOQVVKSA 
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domain (red, PDB accession number 1EXJ**). c, MetH binds MeCbl (methyl 
group in yellow); modelling 5’-dAdo (cyan) results in steric clashes. d, CarH 
accommodates AdoCbl through several substitutions compared with MetH. 
Cobalt-coordinating His in green. e, Superposition of MetH and CarH, 
highlighting shift of the four-helix bundle between MetH (grey) and CarH 
(yellow). f, Alignment of CarH and MetH sequences involved in binding the 
Cbl upper face, highlighting substitutions. 
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sequence was narrowed down to a 30-bp segment. Diffraction quality 
crystals of AdoCbl-bound CarH were obtained in complex with a blunt- 
ended 26-bp DNA segment (—47 to —22 relative to carH transcription 
start site, Extended Data Fig. 5a), allowing us to determine the crystal 
structure of DNA-bound CarH to 3.89 A resolution (Fig. 4, Extended 
Data Fig. 6a-h and Extended Data Table 1). 

The structure revealed a unique mode of DNA binding involving three 
of the four DNA-binding domains of tetrameric CarH (Fig. 4a). The 
fourth DNA-binding domain is disordered and not visible in the electron 
density. The overall architecture of the tetramer is the same before and 
after DNA binding except for a reorientation of the DNA-binding 
domains (Extended Data Fig. 6e). All three DNA-binding domains face 
the same way on the DNA segment and bind to a set of three 11-bp 
repeats with a consensus sequence (A/G)A(G/C)(A/C)T(A/G/T) 
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G Cc 
A T 
A T 
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ne A 
R29(M) =e] [co 
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> -opghT GTACAAARGCTTGACARARCOTATAcHh-3 Kez” 4 A 
3r-crcTACATG TIT TTCGAACTGTIITTGGAT Arexh-5" IK 


Figure 4 | CarH DNA binding. a, CarH tetramer bound to a 26-bp DNA 
segment (yellow). CarH is shown in ribbons with one head-to-tail dimer in 
green (Cbl-binding domains) and yellow (helix bundles) and the other dimer in 
grey. AdoCbl shown as in Fig. 2a. DNA-binding domains are shown in cyan. 
Sequence of DNA segment used for crystallization (larger font) as well as 
flanking sequences in the operator (smaller font) are shown below. Cyan bars 
indicate base pairs covered by each DNA-binding domain. Base pairs covered 
by the recognition helix are boxed; red box highlights the promoter —35 
element. Nucleotides protected from hydroxyl radical cleavage are indicated by 
bullets. The orientation of the DNA in the structure was confirmed by heavy 
atom labelling (Extended Data Fig. 6b-d). b, Schematic of CarH-DNA 
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Figure 3 | CarH oligomerization. a, CarH 
protomers arranged in a head-to-tail dimer, 
coloured by domain (helix bundle: yellow; 
Cbl-binding domain: green) with left protomer 
shown in lighter colours. AdoCbl shown as in 
Fig. 2a. DNA-binding domains hidden for clarity. 
b, Close-up of selected residues at the dimer 
interface. c, Core of CarH tetramer, assembled 
from two head-to-tail dimers. Top dimer is 
coloured as in a; bottom dimer is coloured in black 
and grey. Gly160 and Gly192 at the dimer-dimer 
interface are shown as red spheres. Structure is 
from a sample of CarH that degraded during 
crystallization and lacks the DNA-binding 
domains (crystal form 2, see Methods). d, Close-up 
of Gly160 and Gly192 (red spheres) from two 
protomers at the dimer-dimer interface. 

e, Tetramer of full-length CarH including DNA- 
binding domains (crystal form 3), coloured as in 
c. DNA-binding domains of coloured dimer are 
shown in dark cyan, those of grey dimer are shown 
in light cyan. f, Additional view of CarH tetramer, 
revealing how DNA-binding domains are 
positioned on the protein surface. 


(T/G)ACA(A/T) (Fig. 4a). This parallel orientation is stabilized by spe- 
cific interactions between adjacent DNA-binding domains (Extended 
Data Fig. 6f). The central DNA-binding domain comes from one 
head-to-tail dimer, whereas the two flanking DNA-binding domains 
come from the second head-to-tail dimer (Extended Data Fig. 6g, h). 
These structures suggest that the two individual head-to-tail dimers 
would bind to DNA less tightly, consistent with the reduced affinity 
and cooperativity for the dimeric G160Q and G192Q CarH mutants 
(Extended Data Figs 3a and 4g, h). CarH rendered monomeric by light 
exposure*”® or mutagenesis (H142A, D201R) binds DNA with even 
further reduced affinity (Extended Data Figs 3a and 4g). 

To obtain support for this unusual DNA binding mode, hydroxyl 
radical and DNase I footprinting were used to examine the regions of 
DNA protected by CarH. The DNase I footprint, which was obtained 
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interactions for the central DNA-binding domain, denoted as follows: black 
arrows, hydrogen bonds/electrostatic interactions; green arrows, van der Waals 
interactions; solid lines, contacts from protein side chains; dashed lines, 
contacts from main chain; (M), contacts in the DNA major groove; (m), 
contacts in the DNA minor groove. c, Close-up of interactions between CarH 
(cyan) and DNA (yellow). Hydrogen bonds and ionic interactions to the 
phosphate backbone are shown as black dashed lines. Contacts to DNA bases 
are shown in purple. Side-chain orientation is not unambiguous owing to the 
modest resolution of this structure, but many of the contacts shown are 
supported by mutagenesis (Extended Data Fig. 7). 
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using a longer DNA segment (130 bp) than the one used in the crystal 
structure (26-bp), still matches the crystal structure footprint (Extended 
Data Fig. 5a, c). Additionally, three evenly spaced 4-bp hydroxyl radical 
footprints are observed on both the sense and the antisense strand 
(Extended Data Fig. 5c) that correlate with where the ‘wings’ of the 
three DNA-binding domains contact the minor groove (Fig. 4a). 
Taken together, the size of the DNase I footprint, the presence of three 
hydroxyl radical footprints, and the observation that the DNA sequence 
contains three repeats (see above), suggest that CarH binds to DNA 
using three of its four DNA-binding domains. 

To determine whether all three repeats are important for high- 
affinity binding of CarH to DNA, we tested the effect of mutating 
DNA bases in the CarH operator (Extended Data Fig. 5d, e). Mutating 
dinucleotides in any single repeat only led to a small decrease in 
affinity as evidenced by the intense retarded bands for the CarH- 
DNA complex and the small amounts of free DNA in EMSAs 
(mutants 1-3 or 8-10 in Extended Data Fig. 5e). In contrast, simulta- 
neously mutating dinucleotides in any two of the three repeats or in all 
three repeats almost completely abolished DNA binding (mutants 
4-7 and 11-14 in Extended Data Fig. 5e). As a control, we also 
mutated DNA bases in the operator that CarH does not contact 
directly, and as expected, WT CarH bound to these mutants with 
similar affinity (mutants 15-18 in Extended Data Fig. 5e). Given that 
the results of mutations were similar for each of the three repeats, it 
appears that all three repeats are important in determining CarH- 
DNA affinity. 

Each DNA-binding domain forms hydrogen bonds and electrostatic 
interactions to the phosphate backbone, contributed from both the pep- 
tide backbone and the side chains of Trp26, Tyr30, Arg37, Arg43, and 
Lys67 (Fig. 4b, c), and each domain also inserts His42 of its B-hairpin 
‘wing’ into the DNA minor groove. Finally, each DNA-binding domain 
places its recognition helix in the DNA major groove, where it recognizes 
a 6-bp stretch (Fig. 4a) using specific hydrogen bonds from the side 
chains of Gln25, Arg28, and Arg29 (Fig. 4b, c). Strikingly, the major 
groove sequence occupied by the central DNA-binding domain spans 
the promoter —35 element (TTGACA, red box in Fig. 4a) for the major 
o/o”°-associated bacterial RNA polymerase. Thus, these structures 
reveal the mechanism of transcriptional repression: CarH occupies the 
—35 element, blocks access by the RNA polymerase-o* holoenzyme, 
and thereby prevents transcription initiation. 

To verify the observed mode of DNA binding, we generated the 
Q25A, R29A, Y30A, H42A, and R43A CarH mutants and tested their 
DNA binding capacity using EMSAs. Mutating the conserved Arg29 or 
Arg43 to Ala abolished DNA binding but did not affect AdoCbl- 
dependent tetramerization (Extended Data Figs 3a and 7), consistent 


Figure 5 | Light-induced conformational changes in CarH. a, Structure of 
light-exposed CarH, with the helix bundle in orange and Cbl-binding 
domain in grey. DNA-binding domain is hidden for clarity (see Extended Data 
Fig. 8a). Cbl shown with carbons in pink and cobalt in purple. b, Structure of a 
CarH protomer in the dark state, shown with helix bundle in yellow, 
Cbl-binding domain in green, and 5’-dAdo group in cyan. Coloured lines 
highlight domain orientations. c, Light-induced helix bundle movement causes 
tetramer disassembly. Shown is a head-to-tail dimer of CarH in the dark state, 
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with a role of these residues in DNA binding. Notably, the Q25A 
mutant retained DNA-binding capacity, indicating that Gln25 could 
be more important for mediating DNA specificity than affinity 
(Extended Data Fig. 7c). Finally, the H42A and Y30A mutants only 
showed mildly reduced affinity, suggesting that their interactions are 
not essential for DNA binding (Extended Data Figs 3a and 7c). 


Light triggers helix bundle movement 

Finally, to examine how light exposure causes tetramer disassembly, we 
determined the crystal structure of light-exposed CarH to 2.65 A reso- 
lution (Fig. 5, Extended Data Fig. 8a, b and Extended Data Table 1). The 
structure contains monomeric CarH with bound Cbl but without the 
5'-dAdo group, which dissociated as a consequence of light exposure 
(Fig. 5a). The four-helix bundle and the Cbl-binding domain individu- 
ally do not exhibit major conformational changes compared with the 
dark AdoCbl-bound structure. However, the orientation of the helical 
bundle relative to the Cbl-binding domain has changed drastically with 
a>sA displacement (Fig. 5a, b). This helix bundle movement would 
disrupt the head-to-tail dimer interface (Fig. 5c), leading to tetramer 
disassembly, dissociation from DNA, and transcriptional activation. 

Tetramer disassembly is triggered by loss of the AdoCbl 5’-dAdo 
group: its presence in the AdoCbl-bound structure blocks movement 
of the helix bundle, owing in large part to the positioning of W131 
against the upper Cbl ligand (Fig. 2d), keeping the CarH protomers in 
the extended ‘upright’ conformation required for tetramerization. 
Loss of the 5'-dAdo group leaves a large cavity on the Cbl upper face 
(Fig. 5d), prompting movement of the helix bundle to occupy this void 
and cover the Cbl (Fig. 5e). 

Strikingly, the helix bundle motion brings His132 from the protein 
surface to the Cbl upper face, where it binds to the cobalt to form bis-His 
ligated Chl (Fig. 5d, e and Extended Data Fig. 8b). Such bis-His ligation, 
common for haems, has not been reported for Cbl, although bis-His Cbl 
ligation was recently proposed on the basis of mass spectrometry for the 
Cbl-dependent transcription factor AerR’’. We therefore validated 
formation of bis-His-ligated Cbl using UV-vis spectroscopy. Spectra 
of light-exposed WT CarH and a H132A mutant, which is unable to 
form the bis-His ligation, resemble those of free Cbl with two or one 
nitrogen-based ligands”, respectively (Extended Data Fig. 8c, d). In 
contrast, the spectra of the AdoCbl-bound proteins are identical 
(Extended Data Fig. 8e). These results provide unambiguous evidence 
for a bis-His-ligated Cbl in light-exposed CarH and suggest that this 
mode of coordination might be used more frequently in non-haem 
proteins. 

Notably, both WT and H132A CarH undergo light-dependent 
tetramer disassembly, indicating that bis-His ligation is not required 


but the right protomer is replaced by the structure of light-exposed CarH to 
show the steric clash. d, Departure of the 5'-dAdo group after light 

exposure leaves a large cavity on the Cbl upper face. The helix bundle (yellow) 
and the Cbl-binding domain (green) are shown in surface representation 
with selected residues shown as sticks. e, Helix bundle movement fills the cavity 
at the Cbl upper face and brings His132 to the cobalt, where it occupies the 
open coordination site. Colouring as in a. 
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for disassembly (Extended Data Fig. 8f, g). However, Cbl dissociation 
after light exposure is faster for H132A CarH than for WT CarH, 
which forms a very tight and stable complex with the photolysed 
Chl, as indicated by the relative abilities of the protein to be recon- 
stituted with fresh AdoCbl (Extended Data Fig. 9a, b) and by the 
observation of a CarH:Cbl adduct in mass spectrometry for WT 
CarH but not for the H132A mutant (Extended Data Fig. 9c, d). 
Thus, the bis-His ligation could be important to retain the Cbl cofac- 
tor after photolysis. 


Discussion 


Impressively, CarH-type photoreceptors are found in hundreds of 
bacterial genomes, including bacteria that uptake rather than bio- 
synthesize AdoCbl*. CarH is distinguished from most known classes 
of photoreceptors (with the exception of some light-oxygen-voltage 
(LOV)-type photoreceptors”*~*) in that it can bind to DNA directly, 
instead of requiring additional proteins for gene regulation. Beyond 
gene regulation, the CarH light-sensing domains can be found fused 
to effector domains such as histidine kinases and in stand-alone mod- 
ules that could undergo light-dependent protein-protein interac- 
tions*. This versatility probably explains the broad distribution of 
CarH-like proteins in bacteria. 

The use of AdoCbl as a light-sensing chromophore by CarH is 
biologically unprecedented. AdoCbl is structurally and photochemi- 
cally distinct from known photoreceptor chromophores such as 
bilin”, flavins’, retinal*, or Trp side chains”*: light exposure leads 
to breakage of a covalent Co—C bond, whereas other chromophores 
undergo less drastic changes such as light-induced electron transfer or 
cis-trans isomerizations. In all cases, however, light energy is ulti- 
mately harnessed to drive a large-scale conformational change, high- 
lighting the convergence of different light sensing mechanisms. 
AdoCbl was previously best-known as a cofactor for radical-based 
enzyme reactions, in which reversible homolytic cleavage of the 
Co-C bond provides access to the 5’-dAdo radical for catalysis’’, 
and as a modulator of gene expression via riboswitches”**. Our 
structures now allow us to visualize how AdoCbl is repurposed as a 
light sensor in CarH: in the dark, the AdoCbl 5’-dAdo group acts as a 
molecular doorstop that keeps CarH protomers in an extended 
‘upright’ conformation for tetramerization, and light exposure trig- 
gers collapse into a kinked conformation. Whereas AdoCbl photolysis 
is an unwanted side reaction in enzyme catalysis because it leads to 
cofactor inactivation, in CarH this light sensitivity is harnessed to 
drive a light-dependent gene expression switch and a change in physi- 
ology. Remarkably, use of AdoCbl in this alternative function appears 
to come with a safeguard: the product of CarH-bound AdoCbl photo- 
lysis is not a 5’-dAdo radical, but rather 4’,5’- anhydroadenosine”’, 
which differs by one proton and one electron, and cannot cause rad- 
ical damage. Thus, AdoCbl now joins the list of enzyme cofactors that 
have been repurposed as sensors; a list that already includes flavins (as 
light sensors in LOV, blue light sensor using flavin (BLUF), and 
cryptochrome photoreceptors*’*’) and haems (as sensors of oxygen 
and other small molecules**). 

Our CarH structures additionally reveal the functional repurposing 
of two different protein modules. The CarH light-sensing domain mir- 
rors the Cbl-binding module of methionine synthase MetH, an enzyme 
that uses a MeCbl intermediate in the transfer of a methyl group from 
methyltetrahydrofolate to homocysteine, generating tetrahydrofolate 
and methionine**. But whereas MetH uses its helix bundle to position 
Phe708 over the MeCbl methyl group and protect it from photolysis”®, 
CarH, enabled by specific substitutions at the Cbl upper face, uses this 
fold as an AdoCbl-binding light-sensing domain, in which the corres- 
ponding Trp131 senses the presence of the 5'-dAdo group and trans- 
mits the signal of AdoCbl photolysis by leading a conformational 
change of the helix bundle. Furthermore, whereas the Cbl-binding 
module of MetH is embedded in a 136 kDa multi-domain protein 
and juxtaposed to different substrate-binding domains via transient 
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domain-domain interactions during a catalytic cycle”, the light-sens- 
ing domain of CarH is used to assemble a tetramer that is stable enough 
to occlude the —35 element from RNA polymerase. Thus, this module 
has been repurposed from a methyl group carrier in primary metabol- 
ism to a light-sensing modulator of oligomerization state. 

Similarly, the CarH DNA-binding domain resembles those of 
MerR-type transcription factors such as MerR, BmrR, and SoxR, 
whose role as transcriptional activators in the presence of heavy 
metals or other stresses has been established and whose DNA-bound 
structures have been reported’****’. Whereas MerR proteins bind as 
dimers to a (pseudo-)palindromic DNA sequence and distort the 
DNA, which brings promoter elements into alignment for transcrip- 
tional activation (Extended Data Fig. 6i), the CarH tetramer uses its 
DNA-binding domains to bind to three contiguous repeat sequences 
in a parallel mode, which occludes the —35 element and represses 
transcription. This unique DNA binding mode rationalizes the tetra- 
mer architecture of two-head-to-tail dimers of CarH, which is 
unusual for transcription factors but here enables the DNA-binding 
domains to arrange in a parallel fashion and cooperatively engage the 
repeat sequences. 

AdoCbl is a biologically expensive molecule, requiring arduous 
pathways for biosynthesis or specialized machinery for uptake. For 
CarH-using organisms, it would not be surprising if there were a 
recovery mechanism for Cbl following tetramer disassembly, and it 
is tempting to suggest that formation of bis-His ligated Cbl in some 
CarHs might be the first step of such a recovery pathway. In this 
regard, it is interesting to note that His132 is strictly conserved in 
thermophilic bacteria (Extended Data Fig. 3b), where perhaps the bis- 
His ligated Chl is an adaption to elevated temperatures. Although use 
of AdoCbl as a light sensor comes at a price, it appears that the 
physiological benefits make this repurposing worthwhile. 

Altogether, our results provide fundamental insight into a new 
mode of light-dependent gene regulation and reveal an exquisite 
example of cofactor and protein domain repurposing. The structures 
furthermore provide a basis for deployment of the modular CarH 
photoreceptors, in which the light-sensing and DNA-binding activ- 
ities rest on different domains, for engineering light-modulated tran- 
scriptional control or protein-protein interactions. 


Note added in proof: A paper describing a detailed photochemical 
mechanism for CarH based on time-resolved spectroscopic data has 
just been published”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein constructs. Cloning of the pET15b-CarH construct encoding for Thermus 
thermophilus CarH with an N-terminal Hisg-tag was described previously*. The 
H132A mutation was introduced into pET15b-CarH using QuikChange PCR muta- 
genesis (Stratagene) with Pfu Turbo DNA polymerase. All other mutants were 
obtained by gene synthesis (Genscript) with appropriate 5’ and 3’ restriction sites 
for cloning into the pET15b expression vector. 

Protein purification. WT CarH and mutants were purified as described prev- 
iously*. A slightly modified protocol was used for Hisg-tagged CarH for crystal- 
lization. After expression and affinity chromatography, performed as described 
previously*, a threefold molar excess of AdoCbl (Sigma) was added and the mixture 
was incubated on ice for 1 h. All subsequent handling was performed in a dark 
room under red light. The protein solution was applied to a HiLoad 26/60 Superdex 
200 size exclusion column (GE Healthcare) pre-equilibrated with CarH buffer 
(0.1 M NaCl, 0.05 M TriseHCl, pH 8). Under these conditions, tetrameric 
AdoCbl-bound CarH eluted as a single peak, separate from residual amounts of 
monomeric CarH. Fractions containing AdoCbl-bound CarH were combined and 
concentrated to about 8 mg ml‘, as judged by the absorbance at 280 nm using the 
combined £89 pm for AdoCbl (22.5 mM! cm’, determined spectroscopically on 
the basis of published extinction coefficients at 260 nm, 288 nm, and 522 nm (refs 
41-43)) and for CarH (37.9mM' cm’, calculated from the protein sequence 
using ProtParam at http://web.expasy.org/protparam). 

Purified native and mutant protein identities were verified before use by high- 
performance liquid chromatography (HPLC) coupled to electrospray ionization- 
time-of-flight (ESI-TOF) or ion-trap mass spectrometry using an Agilent 1100 
Series HPLC equipped with a |1-well plate autosampler and a capillary pump 
and connected to an Agilent Ion Trap XCT Plus Mass Spectrometer with an ESI 
interface (Agilent Technologies). Samples were injected into a Zorbax Poroshell 
300 SB-C18 HPLC column (Agilent Technologies) that was coupled online to the 
mass spectrometer using an electrospray interface. Samples were separated at 60 °C 
at a flow rate of 0.2 ml min ' using a linear gradient of buffer A (water/acetonitrile/ 
formic acid, 95:4.9:0.1) to 90% buffer B (water/acetonitrile/formic acid, 10:89.9:0.1) 
over 30min and protein elution was monitored at 210nm and 280nm. Mass 
spectra were acquired in the positive ion mode in an m/z range from 100 to 2,200. 

The integrity of the AdoCbl Co-C bond was assessed by UV-vis spectroscopy 
(described below). Protein containing intact AdoCbl was flash-frozen in liquid 
nitrogen until further use. CarH containing photolysed AdoCbl was generated by 
exposing the protein solution to ambient light for 30 min at 4°C. Complete 
photolysis was assessed by UV-vis spectroscopy (described below). Light- 
exposed CarH was used for crystallization experiments immediately. CarH- 
DNA complexes for crystallization were generated by mixing protein and DNA 
at the desired ratio and incubating the mixture for 1h on ice in the dark before 
crystallization experiments. 

Preparation of DNA segments for crystallization. HPLC-purified single- 
stranded DNA oligonucleotides without heavy atom labels (Integrated DNA 
Technologies) or containing a single 5-iodo-deoxycytidine (Jena Bioscience) were 
dissolved to 1 mM in CarH buffer. Equimolar amounts of complementary oligonu- 
cleotides were mixed, heated to 95 °C for 10 min, and then slowly left to cool down to 
4°C in a thermocycler over the course of 1h for annealing. Final double-stranded 
DNA concentrations were assessed by the absorbance at 260 nm using the calculated 
sequence-specific 260 nm (http://biophysics.idtdna.com/UVSpectrum.html). 
Crystallization. Purified AdoCbl-bound CarH was crystallized in three different 
crystal forms. All crystallization procedures for AdoCbl-bound CarH were per- 
formed in a dark room under red light. Crystals of AdoCbl-bound CarH in crystal 
form 1 were obtained by the hanging drop vapour diffusion technique at 25 °C. An 
aliquot (1 jl) of a protein solution (7 mg ml! AdoCbl-bound CarH in CarH 
buffer) was mixed with 1 jl of a precipitant solution (10% (w/v) PEG 8000, 10% 
(v/v) glycerol, 0.04 M KH3PO,) ona glass cover slip. The cover slip was sealed with 
grease over a reservoir containing 500 ul of the precipitant solution. Octahedral 
crystals appeared within 3 days and grew to maximum size within 7 days. Under 
these conditions, the protein underwent proteolysis at the linker region between 
the DNA-binding domain and the four-helix bundle, as judged by SDS-polyacry- 
lamide gel electrophoresis. The crystals consisted only of the C-terminal light- 
sensing domains. Crystals were transferred in two steps of increasing glycerol 
concentration into a cryogenic solution containing 10% (w/v) PEG 8000, 20% 
(v/v) glycerol, 0.04 M KH,PO,, 0.05 M TriseHCl pH 8, and 0.1 M NaCl, soaked 
in that solution for 20s, and then flash-frozen in liquid nitrogen. 

A second crystal form of AdoCbl-bound CarH was obtained by the sitting drop 
vapour diffusion technique at 25°C. An aliquot (0.15 ll) of a protein solution 


(5.9 mg ml’ AdoCbl-bound CarH in CarH buffer, supplemented with 70 LM of 
a 31-bp DNA oligonucleotide) was mixed with 0.15 pl of a precipitant solution 
(20% (w/v) PEG 3350, 0.2 M KCl) using a Phoenix liquid handling robot (Art 
Robbins Instruments). The drop was equilibrated against 70 jl of the precipitant 
solution. Rectangular crystals appeared within 6 months. Again, the protein 
underwent proteolysis and the crystals only consisted of the C-terminal light- 
sensing domains. Crystals were transferred in two steps of increasing glycerol 
concentration into a cryogenic solution containing the precipitant supplemented 
with 20% (v/v) glycerol, soaked in that solution for 5s, and then flash-frozen in 
liquid nitrogen. 

A third crystal form of AdoCbl-bound CarH containing full-length protein 
was obtained by the sitting drop vapour diffusion technique at 25 °C. An aliquot 
(0.15 ul) of a protein solution (6 mg ml! AdoCbl-bound CarH in CarH buffer, 
supplemented with 70 1M of a 28-bp DNA segment) was mixed with 0.15 pl of a 
precipitant solution (20% (w/v) PEG 3350, 0.1 M ammonium citrate tribasic 
pH 7) using a Phoenix liquid handling robot (Art Robbins Instruments). The 
drop was equilibrated against 70 ul of the precipitant solution. Rod crystals 
appeared within 1 month. These crystals contained full-length AdoCbl-bound 
CarH but no DNA. Crystals were transferred in three steps of increasing glycerol 
concentration into a cryogenic solution containing the precipitant supplemen- 
ted with 20% (v/v) glycerol, soaked in that solution for 10 s, and then flash- 
frozen in liquid nitrogen. 

Light-exposed CarH was crystallized by the hanging drop vapour diffusion 
technique at 25°C. An aliquot (1) of a protein solution (4.5 mg ml’ light- 
exposed CarH in CarH buffer) was mixed with 1 yl of a precipitant solution (3.4 
M NaCl, 0.1 M Bis-Tris pH 6) ona glass cover slip. The cover slip was sealed with 
grease over a reservoir containing 500 1l of the precipitant solution. Octahedral 
crystals appeared within 8 months. Crystals were transferred in three steps of 
increasing glycerol concentration into a cryogenic solution containing the pre- 
cipitant supplemented with 18% (v/v) glycerol, incubated in that solution for 10s, 
and then flash-frozen in liquid nitrogen. 

CarH bound both to AdoCbl and to a 26-bp DNA segment was crystallized by 
the hanging drop vapour diffusion technique at 25°C. An aliquot (1 pl) of a 
protein solution (6 mg ml~! AdoCbl-bound CarH in CarH buffer, supplemented 
with 67.5 UM ofa 26-bp DNA segment, 1.5-fold molar excess) was mixed with 1 jl 
of a precipitant solution (16% PEG 3350, 0.2 M L-proline, 0.1 M HEPES pH 7.5) 
on a glass cover slip. The cover slip was sealed with grease over a reservoir 
containing 500 pl of the precipitant solution. Tetragonal bipyramidal crystals 
appeared within 3 weeks. Crystals were transferred in three steps of increasing 
PEG 400 concentration into a cryogenic solution containing the precipitant sup- 
plemented with 15% (w/v) PEG 400, incubated in that solution for 20 s, and then 
flash-frozen in liquid nitrogen. 

CarH bound to both AdoCbl and a 26-bp DNA segment containing 5-iodo- 

deoxycytidine (Extended Data Fig. 6b-d) in position —25 of the sense strand 
(Extended Data Fig. 5a) was crystallized by the hanging drop vapour diffusion 
technique at 25°C. An aliquot (1 pl) of a protein solution (5 mg ml~’ AdoCbl- 
bound CarH in CarH buffer, supplemented with 94 M of the iodine-labelled 26- 
bp DNA segment, 2.5-fold molar excess) was mixed with 1 il of a precipitant 
solution (11.5% PEG 3350, 0.28 M L-proline, 0.1 M Tris pH 8.5) ona glass cover 
slip. The cover slip was sealed with grease over a reservoir containing 500 il of the 
precipitant solution. Crystals appeared within 4 months. Crystals were trans- 
ferred in five steps of increasing xylitol concentration into a cryogenic solution 
containing the precipitant supplemented with 25% (w/v) xylitol, incubated in that 
solution for 30s, and then flash-frozen in liquid nitrogen. 
Data collection and processing. All data were collected at the Advanced Photon 
Source (Argonne, Illinois, USA) at beamline 24ID-C using a Pilatus 6M pixel 
detector at a temperature of 100 K. Crystals of AdoCbl-bound CarH crystal form 
1 belong to space group P4322. An initial AdoCbl-bound CarH crystal was used 
for a fluorescence scan to determine the Co peak wavelength for anomalous data 
collection. Another crystal was then used for collection of both native data and 
anomalous peak data. Native data were collected in a single wedge of 75° at a 
wavelength of 0.9792 A (12,662 eV). The crystal was displaced continuously along 
its major macroscopic axis during data collection. Anomalous peak data were 
collected in a single wedge of 345° at a wavelength of 1.6039 A (7,730 eV). The 
crystal was aligned using a mini-k goniometer such that Bijvoet mates were 
recorded on the same frame. 

All other data except for iodine anomalous data and native data of light-exposed 
CarH (see below) were collected at a wavelength of 0.9795 A (12,658 eV). Crystals 
of AdoCbl-bound CarH crystal form 2 belong to space group P2)2)2). Data were 
collected in a single wedge of 100°. Crystals of AdoCbl-bound CarH crystal form 3 
belong to space group P1. Data were collected in a single wedge of 270° and the 
crystal was displaced continuously along its major macroscopic axis during data 
collection. Crystals of light-exposed CarH belong to space group [4,22 and data 
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were collected at a wavelength of 0.9791 A (12,663 eV) in a single wedge of 150°. 
Crystals of DNA-bound CarH both with and without the iodine label belong to 
space group P2,2,2. Data for crystals with unlabelled DNA were collected in a 
single wedge of 180°. Data for crystals of CarH in complex with iodine-labelled 
DNA were collected at a wavelength of 1.7365 A (7,140 eV) in a single wedge of 
200° and the crystal was displaced continuously along its major macroscopic axis 
during data collection. 

Data for the AdoCbl-bound CarH (crystal form 1) Co peak data set were 

integrated in HKL2000 and scaled in Scalepack™. Data for all other data sets were 
integrated in XDS and scaled in XSCALE”. Data collection statistics are sum- 
marized in Extended Data Table 1. 
Structure building and refinement. The structure of AdoCbl-bound CarH in 
crystal form 1 (space group P432,2) was determined to 2.80 A resolution using 
single-wavelength anomalous diffraction. Positions of two cobalt sites, corres- 
ponding to two CarH protomers in the asymmetric unit, were located using 
ShelxD** in the HKL2MAP shell’? and refined using SHARP/autoSsHARP”. 
The initial overall figure of merit (acentric) was calculated by SHARP to be 
0.43 to 5.1 A resolution. Experimental maps from the SHARP output, solvent 
flattened using SOLOMON” and extended to 3.3 A resolution, were of sufficient 
quality to place two copies of the Cbl-binding domain of MetH”’ (PDB accession 
number 1BMT, residues 745-868), eight additional helices, and AdoCbl in the 
electron density. This initial model was used to better define solvent boundaries in 
another round of solvent flattening of SOLOMON. Using the resulting electron 
density, loop regions were modified and side chains with visible electron density 
were added. A near-complete model of AdoCbl-bound CarH (containing 374 
amino-acid residues and bound AdoCbl) was then used for rigid body refinement 
in Phenix” against the native AdoCbl-bound CarH data set (crystal form 1) using 
data from 100 to 2.80 A resolution. The resulting R-factors were 42.0% and 44.1% 
for the working and the free R-factor, respectively. The model was refined by 
manual adjustment in Coot* until rigid body refinement in Phenix yielded 
R-factors of 30.8% and 34.7% for the working and the free R-factor, respectively. 
Subsequent cycles of refinement included positional refinement with non-crys- 
tallographic symmetry restraints and individual B-factor refinement in Phenix 
until the R-factors were 20.9% and 24.2% for the working and the free R-factor, 
respectively. This model was not refined to completion. The near-complete model 
was used to determine the structures of AdoCbl-bound CarH in crystal form 2 
(space group P2,2,2,) and crystal form 3 (space group P1), which are of higher 
resolution (crystal form 2) or contain the full-length protein (crystal form 3). 

The structure of AdoCbl-bound CarH in crystal form 2 was determined to 
2.15 A resolution by molecular replacement in Phaser™. The structure in crystal 
form 2 contains four CarH protomers in the asymmetric unit, corresponding to a 
tetramer. After molecular replacement, ten cycles of simulated annealing refine- 
ment were performed in Phenix to remove model bias. The model was then 
refined by iterative cycles of manual adjustment in Coot and refinement in 
Phenix. Initially, strict non-crystallographic symmetry restraints were applied 
for the two head-to-tail dimers in the asymmetric unit. Subsequently, these 
restraints were loosened for residues that are in unique environments either 
because of the asymmetric tetramer architecture or because of crystal contacts. 
In advanced stages of refinement, water molecules were added manually in Coot 
and refined in Phenix, with placement of additional water molecules until their 
number was stable. Final cycles of refinement included TLS parametrization” 
with one TLS group per CarH protomer. 

The structure of AdoCbl-bound CarH in crystal form 3 was determined to 
2.80 A resolution using molecular replacement. First, two CarH tetramers were 
placed in the asymmetric unit using Phaser, accounting for all eight protomers in 
the asymmetric unit. Subsequently, four CarH DNA-binding domains from the 
structure of light-exposed CarH (see below) were placed using Phaser. After 
refinement in Phenix, there was clear electron density for an additional DNA- 
binding domain as well as fragments of the three remaining DNA-binding 
domains, accounting for all eight DNA-binding domains in the asymmetric unit. 
The model was refined by iterative cycles of manual adjustment in Coot and 
refinement in Phenix. Strict non-crystallographic symmetry restraints were 
applied for all CarH protomers in the asymmetric unit and loosened in later 
stages of refinement as described above. No water molecules were added to this 
structure. Final cycles of refinement included TLS parametrization®. For each 
CarH protomer, the light-sensing domain was defined as a single TLS group and, 
if fully present, the DNA-binding domain was defined as an additional TLS 
group. 

The structure of light-exposed CarH was determined to 2.65 A resolution by 
molecular replacement in Phaser using consecutive searches for the CarH Cbl- 
binding domain, the four-helix bundle, and the first conformation of the NMR 
structure of the CarA DNA-binding domain (PDB accession number 2JML"*). 
The structure contains one protomer in the asymmetric unit and all three 
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domains could be placed unambiguously. Ten cycles of simulated annealing 
refinement were performed in Phenix. The model was then refined by iterative 
cycles of manual adjustment in Coot and refinement in Phenix. In advanced 
stages of refinement, water molecules were added manually in Coot and refined 
in Phenix, with placement of additional water molecules until their number was 
stable. Final cycles of refinement included TLS parametrization with one TLS 
group. 

The structure of CarH bound to AdoCbl and a 26-bp DNA segment was deter- 
mined to 3.89 A resolution by molecular replacement in Phaser using consecutive 
searches for two CarH tetramers without the DNA-binding domains and for 
two 26-bp DNA segments (models generated by the 3D-DART server™; http:// 
haddock.science.uu.nl/services/3DDART/). After molecular replacement, there 
was clear electron density for six DNA-binding domains in the asymmetric unit, 
which were positioned manually in the electron density from the structure of light- 
exposed CarH. The model was refined by iterative cycles of manual adjustment in 
Coot and refinement in Phenix. B-factors were refined grouped by residue and 
positions of individual atoms were restrained using non-crystallographic sym- 
metry restraints. Planarity and hydrogen bonding restraints were applied to 
DNA base pairs. Final cycles of refinement included TLS parametrization using 
one TLS group for each CarH protomer and each DNA segment. Anomalous 
difference maps, calculated from data collected on crystals that contained a 
DNA segment with an iodine label at position —25 (Extended Data Fig. 5a) were 
used to unambiguously determine the orientation of the DNA segment in the 
crystal structure and thus validate the sequence assignments. Maps, calculated 
using FFT® in the CCP4 software suite”, revealed a strong anomalous difference 
density peak at one position for each of the two CarH-DNA complexes in the 
asymmetric unit, allowing for position —25 of the sense strand to be assigned in the 
structure (Extended Data Fig. 6b-d). Note that the iodine-labelled DNA segment 
differed slightly from the DNA segment used in the structure determination 
(Extended Data Fig. 5a), but both crystallize in the same space group and with 
the same crystal packing. 

Parameter files for cobalamin were provided by O. Smart at Global Phasing. 
Refinement restrains for the 5'-dAdo group were generated using the Grade Web 
Server (Global Phasing). 

Crystallographic refinement of all CarH structures yielded models possessing 
low free R-factors, excellent stereochemistry, and small root mean square devia- 
tions from ideal values for bond lengths and angles. In all models, side chains 
without visible electron density were truncated to the last atom with electron 
density, and amino acids without visible electron density were not included in the 
model. All refinement statistics are summarized in Extended Data Table 1. The 
models were validated using simulated annealing composite omit maps (AdoCbl- 
bound CarH, light-exposed CarH) or regular refinement composite omit maps 
(DNA-bound CarH) calculated in CNS” and Phenix. Model geometry was ana- 
lysed using MolProbity** and ProCheck”. Analysis of the Ramachandran statist- 
ics using MolProbity indicated that for AdoCbl-bound CarH (crystal form 2), 
98.1%, 1.9%, and 0.0% of residues are in the favoured, allowed, and disallowed 
regions, respectively; for AdoCbl-bound CarH (crystal form 3), 97.7%, 2.3%, and 
0.0% of residues are in the favoured, allowed, and disallowed regions, respectively; 
for light-exposed CarH, 97.8%, 2.2%, and 0.0% of residues are in the favoured, 
allowed, and disallowed regions, respectively; and for AdoCbl- and DNA-bound 
CarH, 97.1%, 2.7%, and 0.2% of residues are in the favoured, allowed, and dis- 
allowed regions, respectively. The larger number of residues in the disallowed 
region of the Ramachandran plot of DNA-bound CarH is due to the modest 
resolution of the structure. Figures were generated using PYMOL™. Interfaces 
between subunits were analysed using the ‘Protein interfaces, surfaces and assem- 
blies’ service PISA at the European Bioinformatics Institute (http://www.ebi. 
ac.uk/msd-srv/prot_int/pistart.html)*'. Crystallography software packages were 
compiled by SBGrid®. 

DNA-binding assays. All DNA binding assays were repeated three to five times for 
each experimental condition. EMSAs were performed in the dark as described 
previously*. A 177-bp DNA probe PCR-amplified using primers with one 5'-end 
*°P_labelled with T4 polynucleotide kinase (T4PK; Takara) before the PCR or 
shorter HPLC-purified synthetic probes (Biolegio) were used in the EMSAs. With 
the latter, one strand was *”P-labelled at the 5’-end with T4PK and then mixed with 
a twofold excess of the unlabelled complementary strand to ensure that all of the 
labelled strand was present as double-stranded probe. The strand mixture was 
incubated at 100 °C for 2 min and then slowly left to cool down for hybridization. 
For EMSAs, a 20 ull reaction volume containing the DNA probe (1.2 nM, approxi- 
mately 13,000 counts per minute) and protein with a fivefold excess of AdoCbl in 0.1 
M KCl, 0.025 M TriseHCl, pH 8, 1 mM DTT, 10% (v/v) glycerol, 200 ng ult BSA, 
and 1 1g of sheared salmon sperm DNA as non-specific competitor was incubated 
for 30 min at 65 °C (177-bp probe) or 30 °C (shorter probes). They were then loaded 
onto 6% native polyacrylamide gels (37.5:1 acrylamide:bisacrylamide) pre-run for 
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30 min in 0.5X TBE buffer (0.045 M Tris base, 0.045 M boric acid, 1 mM EDTA) 
and subjected to electrophoresis for 1.5 h at 200 V, 10 °C. Gels were vacuum-dried 
and analysed by autoradiography. Autoradiograms were scanned using an Image 
Scanner II imager with LabScan 5.0 software (GE Healthcare). Band intensities were 
quantified using Image] (NIH) with those of free DNA used to estimate the fraction 
bound, which was fitted to the three-parameter Hill equation using SigmaPlot 
(Systat Software) to estimate Kg, the apparent equilibrium dissociation constant 
equivalent to the protein concentration for half-maximal binding, and n, the Hill 
coefficient. The latter, for example expected to be 2 for dimer or 4 for tetramer 
DNA-binding models, can vary owing to cooperativity effects, contributions from 
monomer-tetramer equilibria, and/or deviations from true equilibrium. 

DNase I and hydroxy] radical footprinting. DNase I and hydroxyl radical foot- 
printing analyses were performed under solution conditions similar to EMSA using 
previously described protocols*®**. A 130-bp CarH operator-promoter DNA probe 
(Extended Data Fig. 5a) was **P-radiolabelled at the 5’ end of its sense or anti-sense 
strand by PCR using appropriately labelled primers, as described above. For DNase I 
footprinting, 20 pl of **P-radiolabelled DNA probe (~20,000 counts per minute) 
with 800 nM CarH and fivefold excess of AdoCbl in EMSA buffer lacking glycerol 
and with 0.01 M MgCl, were incubated for 30 min at 37 °C, then treated with 0.07 
units of DNase I for 2 min and finally quenched with 0.025 M EDTA. For hydroxyl 
radical footprinting, samples (as for DNase I footprints but without MgCl.) were 
treated with 2 1l each of freshly prepared Fe(II)-EDTA solution (1 mM ammonium 
iron (II) sulfate, 2mM EDTA), 0.01 M sodium ascorbate, and 0.6% hydrogen per- 
oxide for 4 min at 25 °C. The reaction was stopped with 2 il each of 0.1 M thiourea 
and 0.5 M EDTA (pH 8). Footprinting reactions were done under dim light and, after 
quenching, under normal light. DNA from each sample was ethanol precipitated, 
washed twice with 70% ethanol, dried, and resuspended in formamide loading buffer. 
The 5 pl samples were heated at 95 °C for 3 min and loaded onto a 6% polyacryla- 
mide-8 M urea sequencing gel together with G + A chemical sequencing ladders. 
Gels were vacuum-dried and analysed by autoradiography, and the bands quanti- 
tated using GelAnalyzer 2010a (http://www.gelanalyzer.com). Each experiment was 
repeated at least three times. 

Analytical SEC. Analytical SEC for all CarH mutants except for H132A CarH was 
performed using an AKTAbasic unit and a Superdex 200 analytical SEC column 
(GE Healthcare)*. The calibration curve was log(M, = 7.885 — 0.221V.), where 
M, is the apparent molecular mass and V, is the elution volume. Pure protein 
(100 il, 50-100 tM) was incubated with a fivefold molar excess of AdoCbl for at 
least 15 min and analysed by SEC in the dark or after light irradiation for 5 min 
with white light from fluorescent lamps at 10 W m ’. Elution at 0.4ml min™! 
flow rate was tracked by absorbance at 280 nm and 522 nm, and M, was estimated 
from V.. Each SEC experiment was performed at least three times. 

Analytical SEC for H132A CarH was performed using an AKTA FPLC unit 
and a Superose 6 10/300 GL column (GE Healthcare) equilibrated with CarH 
buffer. The calibration curve was log(M, = 9.74 — 0.30V.). WT or H132A CarH 
(300 pl, 20-50 4M) with stoichiometric AdoCbl with or without exposure to 
white light for 1h were injected onto the column and elution was tracked by 
absorbance at 280 nm. For AdoCbl exchange studies, WT or H132A CarH sam- 
ples were exposed to light as described and then incubated with a tenfold excess of 
free AdoCbl for the given periods and temperatures and analysed by SEC. 
Solution UV-vis spectroscopy. Solution UV-vis spectra were recorded at 25 °C on 
a SpectraMax Plus 384 (Molecular Devices) using SoftMax Pro 5 software (Molecular 
Devices) and a 1cm path length quartz cuvette (Starna). WT or H132A CarH in 
CarH buffer were transferred to the cuvette under red light or after exposure to white 
light for 20 min and UV-vis spectra were recorded from 250 to 800 nm. The spec- 
trum of pure CarH buffer was used for background subtraction. No photolysis 
occurred on the timescale of spectrum acquisition, as repeated acquisition did not 
lead to spectral changes. 

Spectra of Cbl with increasing imidazole concentrations, similar to spectra 
reported previously”, were obtained with the same experimental parameters. 


Cbl solutions contained 504M OHCbleHCl (Sigma) in 50mM Tris with 
0 mM, 0.4mM, or 400 mM imidazole, adjusted to a final pH of 8 to match the 
protein solutions. All solutions were incubated for 16 h at 25 °C to allow complete 
ligand exchange to take place. 

Single crystal UV-vis spectroscopy. Single-crystal UV-vis spectra were 
recorded at a temperature of 100K at Stanford Synchrotron Radiation 
Laboratory beamline 11-1 (Menlo Park, California, USA) using a UV-vis micro- 
spectrophotometer. The setup used a Hamamatsu light source (50 jm light spot) 
with deuterium and halogen lamps, UV solarization-resistant optical fibres, 
reflective Newport Schwardchild objectives, and an Ocean Optics QE65000 
Spectrum Analyzer. Spectra were acquired as 50 averages with an integration 
time of 0.03 s and a boxcar width of 3. A crystal of AdoCbl-bound CarH was 
cryoprotected, transferred to a nylon fibre loop, and frozen in liquid nitrogen as 
described above. A background spectrum was acquired on a region of the fibre 
loop containing just cryoprotectant. A sample spectrum was then acquired on the 
crystal. 
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4 —— AdoCbl-bound CarH crystal 
— AdoCbl-bound CarH solution 
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Extended Data Figure 1 | CarH crystals contain intact AdoCbl. a, UV-vis 
spectra obtained from AdoCbl-bound CarH crystals at T = 100 K (red trace) or 
AdoCbl-bound CarH in solution at T = 298 K (black trace) exhibit good 
qualitative agreement and similar features, including a peak centred around 
540 nm with a shoulder around 560 nm. Because many band intensities are 
orientation-dependent and the crystal spectrum changes with orientation but 
molecules are rotationally averaged in solution, quantitative comparison of 
the spectra is difficult. Note also that individual bands appear sharper in the 
crystal spectrum because the molecules have fewer rotational degrees of 
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freedom and because fewer vibrational states are populated at T = 100 K. 

b, Simulated annealing composite omit electron density (2.15 A resolution) 
contoured around AdoCbl at 1.00 (grey). The electron density covers the entire 
AdoCbl molecule including the Co-C bond, indicating that the Co-C bond 
remained intact during crystallization and data collection. AdoCbl is shown in 
stick representation with Cbl carbons in pink and 5'-dAdo group carbons in 
cyan. Co is shown as a purple sphere. The Co-coordinating His177 is shown in 
sticks with carbons in green. CarH is shown in ribbons with the helix bundle in 
yellow and Cbl-binding domain in green. 
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Extended Data Figure 2 | The CarH DNA-binding domain is flexible in are coloured in dark cyan, light cyan, dark blue, black, and grey. AdoCbl is 
the absence of DNA. a, Overlay of five CarH protomers, including the shown with Cbl carbons in pink, 5’-dAdo group carbons in cyan, and cobalt in 
protomer shown in Fig. 2a, highlighting flexibility of DNA-binding domains. _ purple. b-e, Individual CarH protomers shown side by side. Orientation 
Structures are aligned by the Cbl-binding domains (green) and helix bundles —_ and colouring as in a. 

(yellow) and shown in the same orientation as Fig. 2a. DNA-binding domains 
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Extended Data Figure 3 | CarH mutant analysis and multiple sequence 
alignment. a, Results summary for in vitro CarH mutant analysis. Table 
footnotes are as follows: *oligomerization was probed by SEC and DNA 
binding by gel shift analysis. +Y30A, H42A: weakened binding at 100 nM 
protein. {G160Q, G192Q: dimer, no tetramer. 8G160Q, G192Q: binds with 
reduced affinity and cooperativity and as a higher mobility (smaller size) 
complex. b, Alignment of CarH sequences from different bacterial species. 
Sequence identity is shown in white font with red background, sequence 
similarity in red font. Coloured triangles highlight functionally important 
positions, with filled triangles indicating residues analysed by mutagenesis in 
this study and empty triangles indicating residues not analysed by mutagenesis. 
Mutating the highly conserved His177 of the Cbl-binding motif, the lower axial 
ligand of bound AdoCbl, has previously been shown to impair AdoCbl binding 


and tetramerization®. Colouring is as follows: hydrogen bonds/ionic 
interactions to DNA, orange; contact to 5’-dAdo, green; histidines 
coordinating Cbl (His132 only coordinates after light exposure), red; hydrogen 
bonds/ionic interactions at dimer interface, black; hydrogen bonds/ionic 
interactions as well as Gly160 and Gly192 at the dimer—dimer interface, cyan. 
Residues involved in more than one type of interaction are coloured half/half. 
Residues at protein interfaces are less well conserved than other functionally 
important residues, probably because compensatory mutations and local 
structural deformations are possible. Note, however, that the T. thermophilus 
Arg176-Asp201 pair observed in our structure is inverted in Myxococcus 
xanthus, suggesting that the interaction is conserved. Alignment generated 
using ESPript™. 
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Extended Data Figure 4 | Characterization of CarH mutants affecting 
oligomerization state. a-f, SEC traces (Superdex 200 analytical SEC column) 
of CarH carrying mutations (a, b) near the 5’-dAdo group; ¢, d, at the 
head-to-tail dimer interface; and e, f, at the dimer-dimer interface. Shown are 
traces of mutants incubated with AdoCbl in the dark (top panels) and after light 
exposure (bottom panels). In all panels, both absorbance Ago nm (tracking 
protein) and As > nm (tracking Cbl) traces are shown. Molecular masses are 
calculated from the observed elution volumes as described in Methods, and are 
consistent with a tetrameric species (137 kDa), a dimeric species (89 kDa), anda 
monomeric species (39 kDa). Notably, mutant CarH proteins that do not 
tetramerize in the presence of AdoCbl (D201R, H142A) also do not appear to 
bind AdoCbl (see 522 nm traces of dark samples). This finding is consistent 
with previous studies that show cooperativity of AdoCbl binding and 
tetramerization, a feature that does not hold for other forms of Cbl 


(methylcobalamin, CNCbl and Cbl)*. Both of these mutant proteins can still 
bind Cbl (see 522 nm traces of light-exposed samples), which further suggests 
that these mutants are properly folded and that the lack of AdoCbl binding 
stems from inability to oligomerize. Although the degree of tetramerization of 
CarH mutant proteins in the dark varied, all of these mutant proteins form 
Cbl-bound monomers after light exposure. g, DNA-binding capacity of WT 
and mutant proteins (800 nM) as determined by EMSAs after incubation 
with AdoCbl (4 1M) in the dark. h, EMSA data for WT CarH and the G160Q 
and G192Q mutants fit to the Hill equation, as described in Methods. Kg 

(in nM) and Hill coefficients from the fits are, respectively, (67 + 2) and 

(5.1 + 0.7) for WT CarH, (111 + 18) and (3.0 + 0.2) for G160Q CarH, and 
(253 + 17) and (2.5 + 0.3) for G192Q CarH. The data shown are the mean 
values and standard errors of three to five repeat experiments. 
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Extended Data Figure 5 | Identification and validation of CarH operator 
sequence by EMSAs and footprinting. a, Location of CarH operator in the 
intergenic region between carH and the carotenogenic crtB of the 

T. thermophilus genome. Structural and biochemical data are mapped onto 
the sequence. Three 11-bp CarH binding sites are shown in cyan font and the 
promoter —35 element is highlighted with a red box. Nucleotides protected 
from hydroxyl radical cleavage are indicated with bullets. The ~42-nucleotide 
DNase I footprint on the sense strand is shown above the sequence and that 
of the antisense strand has been omitted for clarity. Nucleotide numbering on 
the sense strand is relative to the carH transcription start site (underlined, 
+1)". To identify suitable DNA constructs for crystallization, operator 
sequences were systematically trimmed around a ~40-bp segment, as indicated 
by the black bars, and binding was assessed by EMSAs (shown in b). The 
sequences of two 26-bp DNA segments used for co-crystallization are also 
shown. The blunt-ended 26-bp segment was used for determination of the 
CarH-DNA structure. The second 26-bp segment contained one-nucleotide 
3'-overhangs and 5-iodo-deoxycytidine in position —25 (red) and was used to 
validate the mode of DNA binding. b, Binding of CarH (800 nM) to DNA 
segments of different lengths after incubation with AdoCbl (4 1M) in the dark. 
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Substantial DNA binding was observed for a probe as small as 30-bp. c, DNase I 
and hydroxyl radical footprints of CarH on a 130-bp operator DNA segment. 
Disappearance of bands in the presence of CarH indicates protection from 
cleavage. Protected regions are marked on the side and were mapped onto the 
operator sequence using G + A chemical sequencing experiments performed 
in parallel. d, e, CarH binding to 40-bp operators carrying mutations. 

d, Sequences of tested operator variants. WT operator sequence shown at the 
top and bottom, with repeat sequences that CarH recognizes shown in cyan; 
6-bp stretch contacted by CarH recognition helix is boxed. Mutations are as 
follows: Mut1-7: single (1-3), pairwise (4-6), and triple (7) mutations of AC to 
GT (positions 8/9); Mut8-14: single (8-10), double (11-13), and triple (14) 
mutations of (A/C)T to GC (positions 4/5); Mut15-18: pairwise (15-17) and 
triple (18) mutations of (A/G)A) to TT (positions 1/2). e, EMSAs with WT 
CarH (800 nM) and each of the 40-bp operator variants after incubation with 
AdoCbl (4 1M) in the dark. Note that two additional lower mobility complexes 
are observed, most apparent with the WT operator and its variants with 
comparable binding. The origin of these complexes is unknown, but they 
probably arise from oligomeric equilibria and residual amounts of light- 
exposed protein in the sample. 
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Extended Data Figure 6 | CarH DNA binding, conformational changes 
upon binding, and comparison with BmrR. a, 2F, — F, omit electron density 
(3.89 A resolution) for DNA-bound CarH, calculated after performing full 
refinement of the model with DNA omitted and contoured at 1.00. DNA is 
shown with carbons in yellow and recognition helix of a CarH DNA-binding 
domain with carbons in cyan. b, c, Validation of DNA-binding mode using 
heavy-atom-derivatized DNA segments. CarH was crystallized with a DNA 
segment containing 5-iodo-deoxycytidine in position —25 of the sense strand. 
Shown is the resulting anomalous difference density (purple mesh), contoured 
at 60, for both CarH-DNA complexes in the asymmetric unit, with peaks 
directly adjacent to the C5 atom of deoxycytidine in position —25. d, Chemical 
structure of 5-iodo-deoxycytidine. e, Comparison of CarH before and after 
DNA binding, revealing rearrangement of DNA-binding domains. CarH 
before DNA binding is shown with helix bundles and Cbl-binding domains in 
grey and DNA-binding domains in pink. CarH bound to DNA is shown with 
helix bundles and Cbl-binding domains in green and DNA-binding domains in 
cyan. The fourth DNA-binding domain of DNA-bound CarH is disordered 


and not modelled. DNA is shown in yellow. AdoCbl is shown with Cbl carbons 
in pink and 5'-dAdo group carbons in cyan. f, Contacts between residues in 
neighbouring DNA-binding domains, coloured by domain. Each interface 
between two DNA-binding domains buries 280 A’ of surface from solvent on 
each DNA-binding domain. Interactions of Arg72 to Tyr7 and Glul11 are 
indicated by black dashed lines. Colouring as in e. g, h, Models of individual 
CarH head-to-tail dimers bound to DNA. g, Head-to-tail dimer contributing 
the middle of the three DNA-binding domains, coloured by domain with 
DNA-binding domain in cyan, helix bundles in yellow, and Cbl-binding 
domains in green. The DNA-binding domain of the second protomer (right) is 
disordered and not modelled. DNA and AdoCbl are shown as in e. h, 
Head-to-tail dimer contributing the flanking DNA-binding domains. Helix 
bundles and Cbl-binding domains are shown in grey, remaining colouring as in 
e. i, BmrR bound to DNA (PDB accession number 1EXJ**). A BmrR dimer 

is shown in ribbon representation in orange and red. DNA is shown in yellow. 
BmrR binds as a dimer to a palindromic sequence and distorts the DNA 
double strand from its ideal conformation. 
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Extended Data Figure 7 | In vitro characterization of CarH DNA binding 
mutants. a, b, SEC traces (Superdex 200 analytical SEC column) of CarH 
carrying mutations in the DNA-binding domain. Shown are traces of mutants and a monomeric species (39 kDa). c, DNA-binding capacity of mutants 


Molecular masses are calculated from the observed elution volumes as 
described in Methods, and are consistent with a tetrameric species (137 kDa) 


(a) incubated with AdoCbl in the dark and (b) after light exposure. In all panels, | (800 nM) as determined by EMSAs after incubation with AdoCbl (4 uM) in 
both A2go nm (tracking protein) and As9> nm (tracking Cbl) traces are shown. the dark. 
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Extended Data Figure 8 | Light-exposed CarH has bis-His ligated Cbl. 

a, Structure of light-exposed CarH including the DNA-binding domain (cyan) 
and other domains coloured as in Fig. 5a. b, Close-up view of the Cbl in light- 
exposed CarH, with both coordinating His side chains shown in sticks. 
Simulated annealing composite omit electron density (2.65 A resolution) is 
shown in blue, contoured at 1.00. (c) UV-vis spectra of light-exposed WT 
CarH (black) and H132A CarH (red) exhibit pronounced differences, 
indicating that the bis-His ligation is also formed in solution. d, UV-vis spectra 
of free OHCb! (50 .M) with increasing imidazole concentration. The spectrum 
of bis-imidazole ligated Cbl (black, Cbl with 400 mM imidazole contains 60% 
bis-imidazole ligated Cbl and 40% Cbl with dimethylbenzimidazole and 
imidazole as ligands”) resembles that of light-exposed WT CarH, whereas the 
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spectrum of free OHCbl (pink) resembles that of light-exposed H132A CarH. 
Note that the latter two are expected to be slightly different because free OHCbl 
contains a dimethylbenzimidazole group as the lower axial ligand, whereas 
light-exposed H132A CarH contains a histidine imidazole as the lower axial 
ligand. Experimental conditions chosen were similar to those reported 
elsewhere”. e, UV-vis spectra of AdoCbl-bound WT CarH (black) and H132A 
CarH (red) are virtually identical, suggesting that the mode of AdoCbl binding 
is unchanged, as is expected from the structure. f, g, Size-exclusion 
chromatograms (Superose 6 10/300 GL column) of AdoCbl-bound and light- 
exposed (f) WT CarH and (g) H132A CarH, demonstrating that H132A CarH, 
like WT CarH, forms a tetramer in the dark and undergoes light-dependent 
tetramer disassembly. 
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Extended Data Figure 9 | Disruption of bis-His ligation by H132A 
mutation facilitates Cb] dissociation after photolysis. a,b, WT and H132A 
CarH were exposed to light, rendering them monomeric, and then incubated 
with free AdoCbl at the indicated temperatures and periods. For AdoCbl to 
bind to CarH and induce tetramerization, the photolysed Cbl has to dissociate 
from the protein first. Thus, the extent of tetramer formation, as assessed by 
SEC (Superose 6 10/300 GL column), is indicative of the affinity of the protein 
for photolysed Cbl. That is, lack of tetramer formation in the presence of fresh 
AdoCbl indicates that the photolysed Cbl is still bound to the protein. The 
observed differential in tetramer formation between H132A and WT CarH is 
substantial: WT CarH retains its photolysed Cbl, showing only a small amount 
of tetramer formation, whereas H132A CarH loses its photolysed Cbl, 
reforming tetramers upon AdoCbl addition. c, ESI-TOF mass spectra of WT 
and H132A CarH after light exposure also reveal differential affinity for 
photolysed Cbl. Light-exposed WT CarH is 1,329 Da larger in molecular mass 
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than light-exposed H132A CarH, a number corresponding to the molecular 
mass of Cbl. This difference in molecular mass suggests that, even under the 
harsh conditions of this experiment, photolysed Cbl remains bound to the 
WT CarH monomer but not to H132A CarH, indicating that Cbl dissociates 
more readily without the bis-His ligation. The minor peak next to WT CarH 
arises from protein bound to a potassium ion (mass shift 39 Da). Species 
marked with an asterisk correspond to an unidentified impurity in the H132A 
CarH sample. d, Control experiment showing ESI-TOF mass spectra of WT 
and H132A CarH in the AdoCbl-bound dark state. The very similar molecular 
masses obtained (differing only because of the H132A mutation) indicate that 
the mutation has no effect on AdoCbl binding, consistent with the fact that 
His132 is not coordinated to Cbl when the upper 5’-dAdo ligand is present. 
Both WT and H132A CarH lose their AdoCbl cofactor as the tetramer 
disassembles into monomeric units. 
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Extended Data Table 1 | Crystallographic data collection and refinement statistics 


AdoCbl-bound  AdoCbl-bound AdoCbl-bound AdoCbl-bound  AdoCbl- and AdoCbl- and light-exposed 
CarH form 1 CarH form 1 CarH form 2 CarH form 3 DNA-bound DNA-bound CarH 


Native * Co Peak * CarH CarH (iodine- 
labeled) ** 
PDB code 5C8A 5C8D 5C8E 5C8F 
Data collection 
Space group P4;2;2 P4322 P2)2)2) Pl P2)2\2 P2)2;2 14,22 
Cell dimensions 
a, b, c (A) 94.5, 94.5, 94.5, 94.5, 51.4, 99.7, 78.7, 79.7, 177.9, 141.8, 176.7, 141.7, 126.9, 126.9, 
180.5 180.6 144.0 118.4 162.7 162.6 149.5 
a, By (°) 90.0, 90.0, 90.0 90.0, 90.0, 90.0 90.0, 90.0,90.0 90.7, 96.6, 90.0, 90.0, 90.0 90.0, 90.0, 90.0 90.0, 90.0, 90.0 
117.3 
Wavelength (A) 0.9792 1.6039 0.9795 0.9795 0.9795 1.7365 0.9791 
Resolution (A) 200 — 2.80 200 — 3.30 100 — 2.15 100 — 2.80 100 — 3.89 100 — 5.00 100 — 2.65 
(2.87 — 2.80) (3.36 — 3.30) (2.21 — 2.15) (2.87 — 2.80) (3.99 — 3.89) (5.13 — 5.00) (2.72 — 2.65) 
Rem (%) * 6.5 (69.0) 12.4 (38.6) 5.3 (60.8) 10.5 (83.2) 9.8 (126.4) 7.2 (107.1) 6.9 (167.8) 
Rrneas (%) * 7.2 (76.5) —§ 6.2 (71.5) 11.7 (91.9) 10.2 (131.9) 8.6 (127.2) 7.2 (174.4) 
CCin* 99.9 (77.1) —§ 99.9 (87.7) 99.7 (84.3) 99.9 (87.0) 99.9 (54.3) 100.0 (68.5) 
<I/o()>? 16.4 (2.3) 15.7 (7.0) 15.2 (2.1) 12.0 (2.0) 15.7 (2.0) 7A (1.1) 26.8 (1.8) 
Completeness (%)* 99.5 (99.9) 99.5 (100.0) 98.5 (99.4) 94.9 (95.5) 99.9 (100.0) 97.7 (98.2) 100.0 (99.9) 
Redundancy* 5.3 (5.4) 11.9 (11.5) 3.6 (3.6) 5.6 (5.6) 12.2 (12.4) 3.5 (3.3) 12.8 (13.4) 
Refinement 
Resolution (A)? 100 —2.15 100 — 2.80 100 — 3.89 100 — 2.65 
(2.21 — 2.15) (2.87 — 2.80) (3.99 — 3.89) (2.72 — 2.65) 
No. reflections* 40511 (2970) 59284 (4419) 38480 (2819) 18067 (1319) 
Ryorks Rice 0.183/0.227 0.183/0.230 0.250/0.257 0.172/0.203 
No. atoms 
protein 5766 14668 14500 2090 
Cbl 364 728 728 91 
5'-deoxyadenosine 72 144 144 - 
water 259 - - 19 
DNA - = 2120 - 
glycerol 6 - - 6 
chloride - - - 2 
B-factors 
protein 48.5 76.4 164.4 85.9 
Cbl 45.1 65.2 153.9 100.8 
5'-deoxyadenosine 46.1 75.8 158.5 — 
water 47.3 - - 71.2 
DNA - - 231.3 - 
glycerol 54.3 - = 95.3 
chloride - - - 94.7 
R.m.s deviations 
Bond lengths (A) 0.004 0.005 0.005 0.004 
Bond angles (°) 0.82 0.92 0.79 0.74 
Rotamer outliers (%) 5 (0.9%) 8 (0.6%) 10 (0.8%) 0 (0.0%) 


* Structure was not refined to completion. 

+ Bijvoet pairs were not merged during data processing. 

tValues in parentheses indicate highest-resolution bin. 

§ Values were not reported in the version of Scalepack used for scaling. 


©2015 Macmillan Publishers Limited. All rights reserved 


1d ial os 


doi:10.1038/nature15708 


Flows of X-ray gas reveal the disruption of a star 


by a massive black hole 
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Richard Mushotzky°, Paul O’Brien’, Frits Paerels', Jelle de Plaa’, Enrico Ramirez-Ruiz!>, Tod Strohmayer’ & Nial Tanvir'’ 


Tidal forces close to massive black holes can violently disrupt stars 
that make a close approach. These extreme events are discovered 
via bright X-ray'* and optical/ultraviolet*® flares in galactic cen- 
tres. Prior studies based on modelling decaying flux trends have 
been able to estimate broad properties, such as the mass accretion 
rate*’. Here we report the detection of flows of hot, ionized gas in 
high-resolution X-ray spectra of a nearby tidal disruption event, 
ASASSN-14li in the galaxy PGC 043234. Variability within the 
absorption-dominated spectra indicates that the gas is relatively 
close to the black hole. Narrow linewidths indicate that the gas does 
not stretch over a large range of radii, giving a low volume filling 
factor. Modest outflow speeds of a few hundred kilometres per 
second are observed; these are below the escape speed from the 
radius set by variability. The gas flow is consistent with a rotating 
wind from the inner, super-Eddington region of a nascent accre- 
tion disk, or with a filament of disrupted stellar gas near to the 
apocentre of an elliptical orbit. Flows of this sort are predicted 
by fundamental analytical theory® and more recent numerical 
simulations””"™*. 

ASASSN- 14li was discovered in images obtained on 22 November 
2014 (modified Julian day myp 56,983), at a visual magnitude of 
V= 16.5 (ref. 15) by the All-Sky Automated Survey for Supernovae 
(ASASSN). Follow-up observations found this transient source to 
coincide with the centre of the galaxy PGC043234 (originally 
Zwicky VIII 211), to within 0.04 arcseconds (ref. 15). This galaxy lies 
at a redshift of z = 0.0206, or a luminosity distance of 90.3 Mpc (for 
Hy =73kms_', Qrmatter = 0.27, Qa = 0.73), making ASASSN- 14li the 
closest disruption event discovered in over ten years. The discovery 
magnitudes indicated a substantial flux increase over prior, archival 
optical images of this galaxy. Follow-up observations with the Swift 
space observatory’s X-ray Telescope’*”” (XRT) established a new X-ray 
source at this location’. 

Archival X-ray studies rule out the possibility that PGC 043234 
harbours a standard active galactic nucleus that could produce bright 
flaring. PGC 043234 is not detected in the ROSAT All-Sky Survey’’. 
Using the online interface to the data, the background count rate for 
sources detected in the vicinity is 0.002 countss ‘arcmin *. With 
standard assumptions (see Methods), this rate corresponds to a lumin- 
osity of L~ 4.8 X 10° ergs ', which is orders of magnitude below a 
standard active nucleus. 

Theory predicts that early tidal disruption event (TDE) evolution 
should be dominated by a bright, super-Eddington accretion phase, 
and be followed by a characteristic t *’* decline as disrupted material 


interacts and accretes*’’. Detections of winds integral to super- 
Eddington accretion have not been reported previously, but f°”? flux 
decay trends in the ultraviolet part of the spectrum (where disk emis- 
sion from active nuclei typically peaks) are now a standard signature of 
TDEs in the literature’®. Figure 1 shows the flux decay of ASASSN- 
14li, as observed by Swift. A fit to the UVM2-filter data assuming an 
index of « = —5/3 gives a disruption date of fp ~ 56,948 + 3 (myp). 
The V-band light is consistent with a shallower pe decay; this can 
indicate direct thermal emission from the disk, or reprocessed emis- 
sion”! (see Methods). 

We triggered approved XMM-Newton programs to study ASASSN- 
14li soon after its discovery. Although the space observatory XMM- 
Newton carries several instruments, the spectra from the two 
Reflection Grating Spectrometer (RGS) units are the focus of this 
analysis. We were also granted a Director’s Discretionary Time 
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Figure 1 | The multi-wavelength light curves of ASASSN-14li clearly 
signal a tidal disruption event. The light curves are based on monitoring 
observations with the Swift satellite. The errors shown on plotting symbols 
are the lo confidence limits on the flux in each band (V, B, U, UVW1 (here 
W1), UVM2 (M2), UVW2 (W2). Contributions from the host galaxy have been 
subtracted (see Methods). The UVM2 filter samples the ultraviolet light 
especially well. The grey shading depicts the f ”” flux decay predicted by 
fundamental theory*”’. The X-ray flux points carry relatively large errors; a 
representative error bar is shown at right. Fits to the decay curve are described 
in the main text and in the Methods. 
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Table 1 | Modelling of the high-resolution X-ray spectra reveals ionized flows of gas 


Mission XMM-Newton XMM-Newton XMM-Newton XMM-Newton Chandra XMM-Newton 
Observation ID 0694651201 0722480201 0722480201 0722480201 17566, 17567 0694651401 
Comment Monitoring Long stare Stare (low) Stare (high) None Monitoring 
Start (mp) 56,997.98 56,999.54 56,999.94 57,000.0 56,999.97, 57,002.98 57,023.52 
Duration (ks) 22 94 36 58 35,45 23.6 
Fy» (107+ ergem~2s~ 4) 2.7+0.7 3.2+04 3.4403 3.4+0.2 2b1R 8 2.68 + 0.08 
Lyp(10"4 ergs”) 2.9+0.7 2.2+03 2.2+0.2 2.0+0.1 1.7793 3.2+0.1 
Fys(107 14 ergcem~?s~1) 1.2+0.3 1.2+0.2 1.07 + 0.08 1.24 + 0.08 1.0233 1.19 + 0.04 
Lys (10 ergs +) 0.25 + 0.06 0.21 + 0.03 0.19+0.01 0.21+0.01 0.174991 0.27 +0.01 
Numw (102° cm~?) 2.6* 2.6+0.6 2.6* 2.6* 2.6* 2.6* 
Nuc (102° cm?) 1.4* 14+0.5 1.4* 1.4* 1.4* 1.4* 
Nutoe (10°? em~*) 0.7 +0.2 1gt08 0:1703 0.9402 O:5 +04 0.5+0.1 
logé (ergems~!) 3.6401 4.1+0.2 41+01 3.9403 3.9483 3.7+01 
Vems (km s~4) 130 +30 110*38 60+8 120+ 20 120*%38 230*89 
Vshite (km $7) -180 +60 -210+40 -360 +50 ~130;%9 —500+89 ~490 +70 
kT (eV) 50.0 + 0.09 514+01 50.0+04 52.6+04 52.6+0.3 49.7+0.9 
Emitting area (10°° cm?) 5.7414 3.7+0.5 4.0+0.3 3.0+0.2 25753 6.1+0.2 
x/v 704.8/567 870.5/563 687.8/564 726.8/565 266.5/178 626.5/566 


Each spectrum was fitted with a simple blackbody continuum, modified by photoionized absorption via the pion model, and interstellar absorption in the host galaxy PGC 043234 and the Milky Way. The fits were 
made using SPEX*S, minimizing a 7’ statistic. In all cases, 1o errors are quoted. Where a parameter is quoted with an asterisk, the listed parameter was not varied. X-ray fluxes Fx and luminosities Ly listed with the 
subscript ‘b’ for ‘broad’ were extrapolated from the fitting band to the 1,24-124 A band; those with the subscript ‘f’ represent values for the 18-35A fitting band. Interstellar column densities Ny are separately 
measured for the Milky Way (Ny, mw) at zero redshift and the host galaxy PGC 043234 (Ny Hq) at a redshift of z= 0.0206. These parameters were measured in the XMM-Newton ‘long stare’ and then fixed in fits to 
other spectra. Variable parameters in the photoionization model are listed together; the negative vsnig values indicate a blueshift relative to the host galaxy. Here éis the ionization parameter of the gas, V;ms is the 
root-mean-square velocity width of the spectral lines, k is Boltzmann's constant, T is temperature and v is the number of degrees of freedom. 


observation with the Chandra X-ray Observatory, using its Low 
Energy Transmission Grating spectrometer (LETG), paired with its 
High Resolution Camera for spectroscopy (HRC-S). 

The 18-35 A X-ray spectra of ASASSN- 14li are clearly thermal in 
origin, so we modelled the continuum with a single blackbody, modi- 
fied by interstellar absorption in PGC 043234 and the Milky Way, and 
absorption from blueshifted, ionized gas local to the TDE. The self- 
consistent photoionization code pion” was used to model the complex 
absorption spectra (see Table 1 and Methods). 

Assuming that the highest bolometric luminosity derived from fits 
to the high-resolution spectra (L = 3.2 + 0.1 X 10“*ergs_') corre- 
sponds to the Eddington limit, a black-hole mass of 2.5 x 10°M a> 


where Mo is the mass of the Sun, is inferred. The blackbody emission 
measured from fits to the time-averaged XMM-Newton spectrum 
gives an emitting area of 3.7 X 10°° cm’, implying r = 1.7 X 10’? cm 
for a spherical geometry. 

This is consistent with the innermost stable circular orbit around a 
black hole of mass M~ 1.9 X 10°Mo. Modelling of the Swift light 
curves (see Fig. 1), using a self-consistent treatment of direct and 
reprocessed light from an elliptical accretion disk’ gives a mass in 
the range of M ~ (0.4-1.2) X 10°Mo (see Methods). Together, the 
thermal spectrum, implied radii and the run of emission from 
X-rays to optical bands unambiguously signal the presence of an accre- 
tion disk in ASASSN- 14li. 
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Figure 2 | The high-resolution X-ray spectra of ASASSN-14li reveal absorption model for the outflowing gas detected in each spectrum is shown 


blueshifted absorption lines. Spectra from the ‘long stare’ with XMM-Newton 
and the combined Chandra spectrum are shown. XMM-Newton spectra 

from the RGS1 and RGS2 units are shown in black and blue, respectively; the 
RGS2 unit is missing a detector in the 20-24 A band. The best-fit photoionized 


in red (see Methods), and selected strong lines are indicated. Below each 
spectrum, the goodness-of-fit statistic (A 7’) is shown before (cyan) and after 
(black) modelling the absorbing gas. The errors on the spectra are lo 
confidence limits on the flux in each bin. 
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Figure 3 | The temperature of the blackbody continuum emission from 
ASASSN-14li is steady over time. The temperature measured in simple 
blackbody fits to Swift/XRT monitoring observations is plotted versus time. 
Errors are 1o confidence intervals. The temperature is remarkably steady, 
contrasting strongly with the declining fluxes shown in Fig. 1. Recent theory 
suggests that winds may serve to maintain steady temperatures in some TDEs". 


Figure 2 shows the best-fit model for the spectra obtained in the 
‘long stare’ with the XMM-Newton/RGS (see Table 1 and Methods). 
An F-test finds that photoionized X-ray absorption is required in fits to 
these spectra at more than the 27a level of confidence, relative to a 
spectral model with no such absorption. The model captures the 
majority of the strong absorption lines, giving y* = 870.5 for 563 
degrees of freedom (see Table 1). The strongest lines in the spectrum 
coincide with ionized charge states of N, O, S, Ar and Ca. Only solar 
abundances are required to describe the spectra. The Chandra spec- 
trum independently confirms these results in broad terms, and 
requires absorption at more than the 6c level of confidence. 

A hard lower limit on the radius of the absorbing gas is set by the 
blackbody continuum. The best radius estimate probably comes from 
variability timescales within the XMM-Newton ‘long stare’. Analysis 
of specific time segments within the ‘long stare’, as well as flux-selected 
segments, reveals that the absorption varies (see Table 1 and Methods). 
This sets a relevant limit of rS cSt, or rS<3 X 10!° cm, where r is the 
radius of the absorbing gas relative to the central engine, c is the speed 
of light, and dt is the time interval of the variability. Although the 
column density and ionization do not vary significantly, the 
blueshift of the gas does. During the initial third of the observation, 
the blueshift is larger, venig = —360 + 50kms ‘, but falls to Veni = 
—13073) kms ' in the final two-thirds. Shorter monitoring observa- 
tions with XMM-Newton reveal evolution of the absorbing gas, 
including changes in ionization and column density, before and after 
the ‘long stare’ (see Table 1 and Methods). 

Fundamental theoretical treatments of TDEs predict an initial near- 
Eddington or super-Eddington phase®; this is confirmed in more 
recent theoretical studies”’*”*. The high-resolution X-ray spectra were 
obtained within the predicted time frame for super-Eddington accre- 
tion, for our estimates of the black-hole mass”. Although the ioniza- 
tion parameter of the observed gas is high, the ionizing photon 
distribution peaks at a low energy, and the wind could be driven by 
radiation force. Such flows are naturally clumpy, and may be similar to 
the photospheres of novae’’. Given the strong evidence of an accretion 
disk in our observations of ASASSN-14li, the X-ray outflow is best 
associated with a wind from the inner regions of a nascent, super- 
Eddington accretion disk. The local escape speed at an absorption 
radius of r~ 10* GM/c’ (appropriate for M ~ 10°Mo; G is Newton’s 
gravitational constant) exceeds the observed outflow line-of-sight 
speed of the gas, but Keplerian rotation is not encoded in absorption, 
and projection effects are also important. The small width of the 
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absorption lines relative to the escape velocity may also indicate a 
low volume filling factor, consistent with a clumpy outflow or shell. 

The existing observations show a general trend towards higher out- 
flow speeds with time. Corresponding changes in ionization and col- 
umn density are more modest, and not clearly linked to outflow speed™. 
However, some recent work has predicted higher outflow speeds in an 
initial super-Eddington disk regime, and lower outflow speeds in a 
subsequent thin disk regime”'?. An observation in an earlier, more 
highly super-Eddington phase might have observed broader lines and 
higher outflow speeds; future observations of new TDEs can test this. 

Figure 3 shows the time evolution of the blackbody temperature 
measured in Swift/XRT monitoring observations. The temperature is 
remarkably constant, especially in contrast to the optical/ultraviolet 
decline shown in Fig. 1. Observations of steady blackbody temperatures, 
despite decaying multi-wavelength light curves in some TDEs®”*, have 
recently been explained through winds™. Evidence of winds in our data 
supports this picture. 

The low gas velocities may also be consistent with disrupted stellar 
gas on an elliptical orbit in a nascent disk, near the apocentre. This 
picture naturally gives a low filling factor, resulting in a small total 
mass in absorbing gas (see Methods). Recent numerical simulations 
predict that a fraction of the disrupted material in a TDE will circular- 
ize slowly’’, and that flows will be filamentary”, while stellar gas that is 
more tightly bound can form an inner, Eddington-limited or super- 
Eddington disk more quickly. 

The highly ionized, blueshifted gas discovered in our high- 
resolution X-ray spectra of ASASSN-14li confirms both fundamental 
and very recent theoretical predictions for the structure and evolution 
of TDEs. By pairing high-resolution X-ray spectroscopy with an ever- 
increasing number of TDE detections, it will become possible to test 
models of accretion disk formation and evolution, and to explore 
strong-field gravitation around massive black holes”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Estimates of prior black-hole luminosity. Using the ROSAT All-Sky Survey’’, 
the region around the host galaxy, PGC 043234, was searched for point sources. No 
sources were found. Points in the vicinity of the host galaxy were examined to derive 
a background count rate of 0.002 counts s'. Assuming the Milky Way column 
density Ny,mw along this line of sight, and taking a typical Seyfert X-ray spectral 
index of I= 1.7, this count rate translates into L ~ 4.8 X 10°° ergs‘. This limit is 
orders of magnitude below a Seyfert or quasar luminosity. 

Optical/ultraviolet monitoring observations and data reduction. Swift’ moni- 
tors transient and variable sources via co-aligned X-ray (XRT, 0.3-10 keV) and 
ultraviolet-optical (UVOT, 170-650 nm) telescopes. High-cadence monitoring of 
ASASSN-14li with UVOT has continued in six bands: V, B, U, UVW1, UVM2, 
and UVW2 (central wavelength 2. = 550 nm, 440 nm, 350 nm, 260 nm, 220 nm 
and 190 nm). 

All observations were processed using the latest HEASOFT (http://hea- 
sarc.gsfc.nasa.gov/docs/software/lheasoft/) suite and calibrations. Individual 
optical/ultraviolet exposures were astrometrically corrected and sub-exposures 
in each filter were summed. Source fluxes were then extracted from an aperture 
of 3” radius, and background fluxes were extracted from a source-free region to the 
east of ASASSN-14li owing to the presence of a (blue) star lying 10 arcsec to the 
South, using UVOTMAGHIST, a routine within HEASOFT. 

To estimate the host contamination, we have measured the host flux in 3” 
aperture (matched to the aperture used for the UVOT photometry) in pre-out- 
burst Sloan Digital Sky Survey (SDSS”*), 2 Micron All-Sky Survey (2MASS”*), and 
GALEX”' images. We took extra care to deblend the GALEX data, where the large 
point spread function (PSF) resulted in contamination from the star about 10” to 
the South. We estimated the uncertainty in each host flux by varying the inclusion 
aperture from 2” to 4”. 

We then fitted the host photometry to synthetic galaxy templates using the 
Fitting and Assessment of Synthetic Templates (FAST**) code. We employed 
stellar templates from the*catalogue, and allowed the star-formation history, 
extinction law, and initial mass function to vary over the full range of parameters 
allowed by the software. All best-fit models had stellar masses of about 10°?Mo, 
low ongoing star-formation rates (at most about 10 '°Mayr *), and modest 
line-of-sight extinction (Ay < 0.4 mag). 

We integrated the resulting galaxy template spectra over each UVOT filter 

bandpass to estimate the host count rate. For the uncertainty in this value, we 
adopt either the root-mean-square spread of the resulting galaxy template models, 
or 10% of the inferred count rate, whichever value was larger. We then subtracted 
these values from our measured (coincidence-loss corrected) photometry of the 
host plus transient, to isolate the component that is due to TDE. For reference, our 
inferred count rates for each UVOT filter are 5.7+0.6s | forV,94+0.9s | for 
B,4.0+0.4s | for U,0.83 + 0.088 ' for UVW1, 0.29 + 0.038 ' for UVM2, and 
0.49 + 0.05s_ | for UVW2. Figure 1 shows the host-subtracted optical and ultra- 
violet light curves ASASSN-14li. 
Fits to the UVOT/UVM2 light curve. The UVM2 filter provides the most robust 
trace of the mass accretion rate in a TDE like ASASSN-14li; it has negligible 
transmission at optical wavelengths**”’. Fits to the UVM2 light curve with a power 
law of the form f(t) = fo X (t + fp)” with a fixed index of « = —5/3 imply a dis- 
ruption date of fo = 56,980 + 3 (mjD). This model achieves a fair characterization 
of the data; high fluxes between days 80 and 100 (in the units of Fig. 1) result in a 
poor statistical fit (y?/v = 1.7, where v = 54 degrees of freedom). If the light curve 
is fitted with a variable index, a value of —2.6 + 0.3 is measured (90% confidence). 
This model achieves an improved fit ( 1 v= 14, for v = 53 degrees of freedom), 
but it does not tightly constrain the disruption date, placing fy in the MJD 56,855- 
56,920 range. That disruption window is adjacent to an interval wherein the 
ASASSN monitoring did not detect the source’*, making it less plausible than 
the fit with « = —5/3. 

The optical bands appear to have a shallower decay curve than the ultraviolet 

bands. Recent theory”’ predicts that optical light produced via thermal disk emis- 
sion should show a decay consistent with f°’; this might also be due to repro- 
cessing’. The V-band data are consistent with this prediction, though the data are 
of modest quality and a broad range of decays are permitted. 
X-ray monitoring observations and data reduction. The Swift XRT” is a 
charge-coupled device. In such cameras, photon pile-up occurs when two or more 
photons land within a single detection box during a single frame time. This causes 
flux distortions and spectral distortions to bright sources. Such distortions are 
effectively avoided by extracting events from an annular region, rather than from a 
circle at the centre of the telescope PSF. We therefore extracted source spectra from 
annuli with an inner radius of 12 arcsec (5 pixels), and an outer radius of 50 arcsec. 
Background flux was measured in an annular region extending from 140 arcsec to 
210 arcsec. 


Standard redistribution matrices were used; an ancillary response file was cre- 
ated with the xrtmkarf tool (a routine within HEASOFT) using a vignetting 
corrected exposure map. The source spectra were rebinned to have 20 counts 
per bin with grppha. In all spectral fits, we adopted a lower spectral bound of 
0.3 keV (36 A). The upper bound on spectral fits varied depending on the bound- 
ary of the last bin with at least 20 counts; this was generally around 1 keV (12 A). 

The XRT spectra were fitted with a model consisting of absorption in the Milky 
Way of a blackbody emitted at the redshift of the TDE, that is, pha(zashift(bbody- 
rad)), where Ny =4 X 107°cm ” and z= 0.0206. The evolution of the best-fit 
temperature of this blackbody component is displayed in Fig. 3. 

The blackbody temperature values measured from the Swift XRT are slightly 

higher (kT ~ 7-10 eV) than those measured with XMM-Newton and Chandra. If 
an outflow component with fiducial parameters is included in the spectral model 
anyway, the XRT temperatures are then in complete agreement with those mea- 
sured using XMM-Newton and Chandra. 
Estimates of the black-hole mass. Luminosity values inferred for the band over 
which the high-resolution spectra are actually fitted, and for a broader band, are 
listed in Table 1. Taking the broader values as a proxy for a true bolometric fit, the 
highest implied soft X-ray luminosity is measured in the last XMM-Newton 
monitoring observation, giving L ~ 3.2 X 10“*ergs_'. The Eddington luminosity 
for standard hydrogen-rich accretion is Lpaq = 1.3 X 10°8 erg s | (M/Mo). This 
implies a black-hole mass of M ~ 2.5 X 10°Mo. 

Blackbody continua imply size scales, and, if we assume that optically thick 
blackbody emission can only originate at radii larger than the innermost stable 
circular orbit (ISCO), also masses. For a non-spinning Schwarzschild black hole, 
risco = 6GM/c*. The blackbody emission measured in fits to the time-averaged 
XMM-Newton ‘long stare’ gives an emitting area of 3.7 X 107° cm’; implying 
r=1.7X 10" cm for a spherical geometry. The actual geometry may be more 
disk-like, but the inner flow may be a thick disk that is better represented by a 
spherical geometry. If the black hole powering ASASSN-14li is not spinning, this 
size implies a black-hole mass of M~1.9 X 10°Mo. 

We also estimated the mass of the black hole at the heart of ASASSN-14li by 
fitting the host-subtracted light curves (see Fig. 1) using the Monte Carlo software 
TDEFit’. This software assumes that emission is produced within an elliptical 
accretion disk where the mass accretion rate follows the fallback rate** onto the 
black hole with a viscous delay”*. This emission is then partly reprocessed into 
the ultraviolet/optical part of the spectrum by an optically thick layer’. Super- 
Eddington accretion is treated by presuming that a fitted fraction of the Eddington 
excess is converted into light that is reprocessed by the same optically thick layer. 
This excess can be produced either with an unbound wind””’, or with the energy 
deposited by shocks in the circularization process'*””. 

The software performs a maximum-likelihood analysis to determine the com- 
binations of parameters that reproduce the observed light curves. We utilize the 
ASASSN, UVOT and XRT data in our light-curve fitting; the most likely models 
produce good fits to all bands simultaneously. Within the context of this TDE 
model, a black-hole mass of (0.4-1.2) X 10°M (1a) is derived. 

Spectroscopic observations, data reduction and analysis. Table 1 lists the obser- 
vation identification number, start time, and duration of all of the XMM-Newton 
and Chandra observations considered in our work. 

The XMM-Newton data were reduced using the standard Science Analysis 
System (SAS version 13.5.0) tools and the latest calibration files. The rgsproc 
routine was used to generate spectral files from the source, background spectral 
files, and instrument response files. The spectra from the RGS1 and RGS2 units 
were fitted jointly. Prior to fitting models, all XMM-Newton spectra were binned 
by a factor of five for clarity and sensitivity. 

The Chandra data were reduced using the standard Chandra Interactive 
Analysis of Observations (CIAO version 4.7) suite, and the latest associated cal- 
ibration files. Instrument response files were constructed using the fullgarf and 
mkgrmf routines. The first-order spectra from each observation were combined 
using the tool add grating orders, and spectra from each observation were then 
added using add grating spectra. 

The spectra were analysed using the SPEX suite version 2.06 (ref. 20). The fitting 
procedure minimized a 7” statistic. The spectra are most sensitive in the 18-35 A 
band, and all fits were restricted to this range. Within SPEX, absorption from the 
interstellar medium in the Milky Way was modelled using the model ‘hot’; a 
separate ‘hot’ component was included to allow for interstellar medium (ISM) 
absorption within PGC 043234 at its known redshift (using the reds component in 
SPEX). The photoionized outflow was modelled using the pion component within 
the SPEX suite. 

pion”? includes numerous lines from intermediate charge states that are lacking 
in similar astrophysics packages. The fits explored in this analysis varied the gas 
column density (Ny rpg), the gas ionization parameter (¢, where ¢ = L/ nr’, and L 
is luminosity, 1 is the hydrogen number density and r is the distance between the 
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ionizing source and absorbing gas), the root-mean-square velocity of the gas 
(Vrms), and the bulk shift of the gas relative to the source, in the source frame 
(Venift). Spectra from segments within the ‘long stare’ made with XMM-Newton 
were made by using the SAS tool tabgtigen to create good time interval files to 
isolate periods within the light curves of the RGS data. 

The Chandra/LETG spectra were dispersed onto the HRC, which has a rela- 
tively high instrumental background. Fitting the spectra only in the 18-35 A band 
served to limit the contributions of the background. Nevertheless, the Chandra 
spectra are less sensitive than the best XMM-Newton spectra of ASASSN- 14li (see 
Fig. 2). Prior to fitting, spectra from the two exposures were added and then binned 
by a factor of three. 

Figure 2 includes plots of the Ay* goodness-of-fit statistic as a function of 
wavelength, before and after including pion to model the ionized absorption. 
There is weak evidence of emission lines in the spectra, perhaps with a P Cygni 
profile (see below). The best-fit models for the high-resolution spectra predict one 
absorption line at 345A (H-like C vi) that is not observed; small variations to 
abundances could resolve this disparity. 

Blueshifts as small as 200kms~' are measured in the XMM-Newton/RGS 
using the pion model. According to the XMM-Newton User’s Handbook, avail- 
able through the mission website, http://xmm.esac.esa.int/external/xmm_user_ 
support/documentation/index.shtml, the absolute accuracy of the first-order 
wavelength scale is 6 mA. At 18 A this corresponds to a velocity of 100kms '; 
at 35 A, this corresponds to a velocity of 51 kms” !. The model predicts numerous 
lines across the 18-35 A band that are clearly detected; especially with this leverage, 
the small shifts we have measured with XMM-Newton are robust. In particular, 
the difference in blueshift between the low- and high-flux phases of the ‘long stare’, 
—360+50kms7! versus — 130733 kms}, is greater than the absolute cal- 
ibration uncertainties. Differences observed in the outflow velocities between 
XMM-Newton observations are as large, or larger, and also robust. 

The lower sensitivity of the Chandra spectra is evident in the relatively poor 
constraints achieved on the column density of the ionized X-ray outflow Ny-rpg 
(see Table 1). Similarly, the relatively high outflow velocity measured in the 
Chandra spectra should be viewed with a degree of caution. The outflow velocity 
changes from about 500 km s !to just —130 + 130kms~ 1 for instance, when the 
binning factor is increased from three to five. We have found no reports in the 
literature of a systematic wavelength offset between contemporaneous high-reso- 
lution spectra obtained with XMM-Newton and Chandra. 

The small number of high-resolution spectra complicates efforts to discern 

trends. The velocity width of the absorbing gas is fairly constant over time, but 
there is a general trend towards higher blue-shifts. There is no clear trend in 
column density or ionization parameter with time. 
Diffuse gas mass, outflow rates and filling factors. There is no a priori constraint 
on the density of the absorbing gas. Taking the maximum radius implied by vari- 
ability within XMM-Newton ‘long stare’, r= 3 X 10'° cm, and manipulating the 
ionization parameter equation (€ = Ln 'r~?, where L is the luminosity, 1 is the 
number density and r is the absorbing radius), we can derive an estimate of 
the density: n +2 10°cm~*. Even assuming a uniformly filled sphere out to a 
radius of r= 3 X 10'°cm, a total mass of M~ 4X 10°’ is implied, or approxi- 
mately 0.2Mo. 

The true gas mass within r is likely to be orders of magnitude lower, owing to 
clumping and a very low volume filling factor. Using the measured value of Nyrpr 
and assuming n ~ 2 X 10° cm~°, Nu-rpg = nAr gives a value of Ar ~ 6.5 X 10! cm. 
The filling factor can be estimated using Ar/r ~ 0.002. The total mass enclosed out to 
a distance r is then reduced accordingly, down to 4 X 104M. ©» assuming a uniform 
density within r. This is a small value, plausible either for a clumpy wind or gas 
within a filament executing an elliptical orbit. 
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Formally, the mass outflow rate in ASASSN-14li can be adapted from the case 
where the density is known, and written as: 


Mout = pump QLvC,é~* 


where J is the mean atomic weight (j = 1.23 is typical), m, is the mass of the 
proton, Q is the covering factor (0 = Q =4n), L is the ionizing luminosity, v is 
the outflow velocity, C, is the line-of-sight global (volume) filling factor and € is the 
ionization parameter. Using the values obtained in fits to the XMM-Newton ‘long 
stare’ (see Table 1), for instance, Mout ~7.9 x 102 QC, g s |. Taking the value of 
C, derived above, an outflow rate of Mout ~ 1.5 x 1072Q g s | results. The kinetic 
power in the outflow is given by Lyin = 0.5Mv’; using the same values assumed to 
estimate the mass outflow rate, Lyin ~ 3.3 X 10°° erg $8 

Emission from the diffuse outflow. We synthesized a plausible wind emission 
spectrum by coupling the pion and hyd models within SPEX. The hyd code enables 
spectra to be constructed based on the output of hydrodynamical simulations. As 
inputs, the hyd code requires the electron temperature and ion concentrations for a 
gas; these were taken from our fits with pion. We included the resulting emission 
component in experimental fits to the XMM-Newton ‘long stare’. The best-fit 
model gives an emission measure of (1.0 + 0.3) X 10 cm 3, a redshift (relative 
to the host) of 270733) kms |, and an ionization parameter of logé = 4.3 + 0.1. 

According to an F-test, the emission component is only required at the 3a level; 
however, it has some compelling properties. Combined with the blueshifted 
absorption spectrum, the redshifted emission gives P Cygni profiles. For the gas 
density of n~ 2X 10°cm~°* derived previously, the emission measure gives a 
radius of about 10'°cm, comparable to the size scale inferred from absorption 
variability. 

The strongest lines predicted by the emission model include He-like O vu, and 

H-like charge states of C, N and O. This model does not account for other emission 
line-like features in the spectra, which are more likely to be artefacts from spectral 
binning, or calibration or modelling errors. Emission features in the O K-edge 
region may be real, but caution is warranted. Other features are more easily 
discounted given that they differ between the RGS1 and RGS2 spectra. 
Code availability. All of the data reduction and spectroscopic fitting routines and 
packages used in this work are publicly available. The light-curve modelling pack- 
age, TDEFit’, is proprietary at this time owing to ongoing code development; a 
public release is planned within the coming year. 
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Most stars become white dwarfs after they have exhausted their 
nuclear fuel (the Sun will be one such). Between one-quarter and 
one-half of white dwarfs have elements heavier than helium in their 
atmospheres'”, even though these elements ought to sink rapidly 
into the stellar interiors (unless they are occasionally replen- 
ished)*°. The abundance ratios of heavy elements in the atmo- 
spheres of white dwarfs are similar to the ratios in rocky bodies 
in the Solar System®’. This fact, together with the existence of 
warm, dusty debris disks** surrounding about four per cent of 
white dwarfs'*"°, suggests that rocky debris from the planetary 
systems of white-dwarf progenitors occasionally pollutes the atmo- 
spheres of the stars'’. The total accreted mass of this debris is 
sometimes comparable to the mass of large asteroids in the Solar 
System’. However, rocky, disintegrating bodies around a white 
dwarf have not yet been observed. Here we report observations 
of a white dwarf—WD 1145+017—being transited by at least 
one, and probably several, disintegrating planetesimals, with per- 
iods ranging from 4.5 hours to 4.9 hours. The strongest transit 
signals occur every 4.5 hours and exhibit varying depths (blocking 
up to 40 per cent of the star’s brightness) and asymmetric profiles, 
indicative of a small object with a cometary tail of dusty effluent 
material. The star has a dusty debris disk, and the star’s spectrum 
shows prominent lines from heavy elements such as magnesium, 
aluminium, silicon, calcium, iron, and nickel. This system provides 
further evidence that the pollution of white dwarfs by heavy ele- 
ments might originate from disrupted rocky bodies such as aster- 
oids and minor planets. 

WD 1145+017 (also designated EPIC 201563164) is a helium- 
envelope white dwarf (Supplementary Table 1) that was observed by 
NASA’s Kepler space telescope during the first campaign of its two- 
wheeled mission—a mission referred to hereafter as K2. After proces- 
sing K2 data taken from WD 1145+017 to produce a light curve and 
correcting for instrumental systematics’*, we identified a transit-like 
signal with a period of 4.5 h by using a box-fitting least-squares search 
algorithm’. Using a Fourier analysis on the systematic-corrected K2 
data, we identified five other weaker, but statistically significant, peri- 
odicities in the data, all with periods between 4.5 h and 5h (Fig. 1 and 
Supplementary Table 2). We examined the dominant periodicity and 
found that the depth and shape of the transits varied substantially over 
the 80 days of K2 observations (Fig. 2). 

We initiated follow-up, ground-based photometry to achieve better 
time resolution of the transits seen in the K2 data (Supplementary 
Fig. 1). We observed WD 1145+017 frequently over the course of 
about a month with the 1.2-metre telescope at the Fred L. Whipple 
Observatory (FLWO) on Mount Hopkins, Arizona; with one of the 
0.7-metre MINiature Exoplanet Radial Velocity Array (MINERVA) 


telescopes, also at FLWO; and with four of the eight 0.4-metre tele- 
scopes that compose the MEarth-South Array at the Cerro Tololo 
Inter-American Observatory in Chile. Most of these data showed no 
interesting or noteworthy signals, but on two nights we observed deep 
(with up to 40% of the star’s brightness blocked), short-duration 
(5-min), asymmetric transits separated by the dominant 4.5-h period 
identified in the K2 data (Fig. 3). In particular, using the 1.2-m FLWO 
telescope in the V-band (green visible light), we detected two transits 
separated by 4.5h on the night of 11 April 2015; furthermore, using 
four of the eight MEarth-South array telescopes (all in near-infrared 
light, using a 715-nm long-pass filter), we detected two transits sepa- 
rated by the same 4.5-h period on the night of 17 April 2015. The 
transits did not occur at the times predicted from the K2 data, and the 
two transits detected on 11 April happened nearly 180 degrees out of 
phase from the two transits detected on 17 April. Observations with 
MEarth-South in near-infrared light and with MINERVA in white 
visible light the next night (18 April) showed only a possible transit 
event, of 10%-15% depth, in phase with the previous night’s events. 
The 5-min duration of the transits is longer than the roughly 1-min 
duration we would expect for a solid body transiting the white dwarf. 

Nonetheless, we confirmed that these events are indeed transits by a 
low-mass object in orbit around the white dwarf. The depth and mor- 
phology of the transits that we see in the ground-based data cannot be 
explained by stellar pulsations, and archival and adaptive optics 
imaging place strong constraints on scenarios involving a binary star 
in the background, whose eclipses might mimic transits of the white 
dwarf (Supplementary Fig. 2). We also obtained spectroscopic obser- 
vations with the MMT Blue Channel spectrograph; we used these 
observations to place limits on radial-velocity variations that would 
indicate stellar companions. The radial-velocity measurements exclude 
companions larger than ten Jupiter masses at the 95% confidence level. 

The spectra also reveal that the atmosphere of the white dwarf 
contains magnesium, aluminium, silicon, calcium, iron, and nickel 
(Supplementary Fig. 3). These elements, which are heavier than 
helium, have settling times that are much shorter than the cooling 
age of the white dwarf, indicating that they have been deposited in the 
white dwarf’s envelope in the past million years>—much more recently 
than it formed, about 175 + 75 million years ago. Archival photometry 
for this system is well fitted by a 15,900-K, metal-rich white-dwarf 
model spectrum, and we find evidence for excess infrared emission 
consistent with a warm (1,150 K) dusty debris disk (Supplementary 
Fig. 4). 

We interpret these observations as evidence that at least one, and 
probably six or more, disintegrating planetesimals are transiting this 
white dwarf. Disintegrating planets have been observed transiting 
main-sequence stars*”*’, and show asymmetric transit profiles and 
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Figure 1 | Six notable periodicities found in the K2 data. a, Harmonic- 
summed Lomb-Scargle periodogram of the K2 data. We label the signals A-F 
in order of amplitude. b-g, K2 light curves folded on the six statistically 
significant (P< 10~*) peaks and binned in phase. When plotting each fold, we 
sequentially removed stronger signals by dividing the data set by the binned, 


variable transit depths—behaviours similar to those seen here. These 
previously detected disintegrating planets are believed to be heated by 
the host star and to be losing mass through Parker-type thermal winds, 
in which the molecules condense into the obscuring dust observed to be 
occulting the star**. The solid bodies themselves are too small to detect, 
so the transits are dominated by the much larger dust clouds trailing the 
planets. The density of the dust cloud is presumed to be highly variable, 
which gives rise to the variable transit depths; in addition, a comet-like 
structure for the dust tails would explain the asymmetric transit 
shapes”, In the case of WD 1145+017, we have identified six stable 
periodicities in the K2 light curve that could be explained by occulta- 
tions of the central star by dust clouds. We propose that each of these 
periodicities could be related to a different planetesimal (or to multiple 
fragments of one minor planet) that is orbiting the white dwarf near the 
tidal radius for rocky bodies. Each planetesimal would sporadically 
launch winds of metal gases, which are probably streaming freely from 
the planetesimal and which condense into dust clouds that periodically 
block the light of the white dwarf. A trailing dust cloud would explain 
the variable transit depths, the asymmetric transit profiles, and the 
longer-than-expected transit durations that we see in the light curves 
of WD 1145+017 (Supplementary Fig. 5). 

We have simulated the dynamics of six planetesimals in circular 
orbits with periods of between 4.5h and 4.9h, and find that such a 
configuration is stable for at least 10° orbits, provided that their masses 
are smaller than or comparable to that of the dwarf planet Ceres 
(1.6 X 10° * Mg, where Mg is the mass of the Earth), or possibly that 
of Haumea (6.7 X 10 *Mq). These six planetesimals must be rocky 
(because gaseous bodies would overflow their Roche lobes, the region 
within which gaseous material can be stably retained by gravity), and 


phase-folded light curves of the stronger signals. Note the differences in scales 
on the y-axes. Error bars are the standard errors of the mean within each bin. 
Brightness is shown relative to the median brightness measurement of 

WD 1145+017. 


must have densities greater than about 2gcm © in order not to be 
tidally disrupted in such short-period orbits*”. We also simulated the 
dynamics of two planetesimals in 1:1 mean motion orbital resonances 
(for example, in horseshoe orbits), and find that two different planet- 
esimals in such orbits outbursting at different times could plausibly 
explain the difference in orbital phases that we see between the K2 light 
curve, the 11 April events, and the 17 April events. 

We estimate that a rate of mass loss of roughly 8X 10°gs ‘ is 
necessary to explain the transits that we see. Various refractory mate- 
rials (including iron, fayalite, albite, and orthoclase) heated by the 
white dwarf could plausibly sublimate from a planetesimal roughly 
the size of Ceres at this rate, despite the white dwarf’s relatively low 
luminosity (Supplementary Fig. 6). These metal vapours would be lost 
quickly via free-streaming winds or by Jeans escape (a classical thermal 
escape mechanism), because the planetesimal escape velocity is com- 
parable to the metal vapour’s thermal speed. We simulated a dust 
cloud condensed from the escaped metal vapour in orbit”®”’, and 
found that the radiation environment in which these planetesimals 
are situated can give rise to dust tails like those we infer from the 
ground-based transit observations (Supplementary Fig. 7). Collisions 
with disk debris*® could also plausibly cause mass from the plan- 
etesimal to be lost into orbit. 

A possible scenario for the formation of the disintegrating planet- 
esimals involves minor planets that are left over from the progenitor 
stellar system, before the star evolved into a white dwarf*’’. In this 
scenario, mass loss from the host star disturbs the stability of the 
planetary system. This can lead to planets or smaller objects (such as 
asteroids or comets) being scattered inwards, into orbits with radii 
much smaller than the size of the progenitor star when it was an 
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Figure 2 | Evolution of the K2 transit light curve over 80 days of 
observations. We show the K2 light curve broken into segments eight days in 
length and folded on the most notable, 4.5-h period. The individual data points 
(sampled with a 30-min integration time) are shown as dots, with different 
colours representing different time segments. The averaged light curve for each 
bin is shown asa solid black line. Each segment is vertically offset for clarity. We 
show the typical measurement uncertainty (standard deviation) with a red 
error bar on one data point in the upper left. 


evolving giant. A challenge for this model is placing the planetesimals 
in close concentric orbits so near the star without being totally dis- 
rupted. Current models suggest that planetesimals can be scattered 
inwards on highly eccentric orbits, tidally disrupted into elliptical dust 
disks, and circularized by Poynting-Robertson drag”. However, 
bodies that could release enough dust to cause the transits of WD 
1145+017 that we have detected are too large to be circularized in this 
way. Recent theoretical work’* on smaller bodies has shown that out- 
gassing material can quickly circularize orbits, but it is unclear how this 
process scales to the massive bodies inferred here. 

Our interpretation of this system is still uncertain. In particular, it is 
difficult to explain the phase shifts observed between the transits 
detected by ground-based photometry and those detected by K2; fur- 
ther ground-based observations are necessary to understand this 
effect. Another possible model is that small rings” or debris clouds 
of disrupted planetary material in a disk occasionally cross in front of 
the star and block its light. Although this could explain the large phase 
shifts that we see between the FLWO and MEarth transits, it is difficult 
to explain the highly stable periods (AP/P < 10~*) seen in the K2 data 
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Figure 3 | Transit light curves measured from two ground-based facilities. 
a, Two events observed at FLWO with a separation equal to the 4.5-h period 
labelled A in Fig. 1, detected by K2. The first event is blue, and the second is 
orange. b, Two events observed by MEarth-South separated by the 4.5-h 
A-period. The first event is blue, and the second is green. The typical MEarth- 
South measurement uncertainty (standard deviation) is shown as a red 

error bar on one data point. The FLWO error bars are smaller than the size 
of the symbols. 


without massive orbiting bodies (Supplementary Fig. 8). Fortunately, 
the large transit depths make follow-up observations that could dis- 
tinguish among these scenarios feasible both from the ground and 
from space. It might be possible to detect periodic infrared emission” 
from the orbiting planetesimals with the James Webb Space Telescope. 
Additional follow-up observations such as transit spectroscopy could 
constrain both scenarios by detecting the presence of molecules in the 
dust tails or the wavelength dependence of the dust scattering”’. 

The evidence presented here—in particular, for the heavy-element 
pollution of the white dwarf WD 1145+017, for a warm dusty debris 
disk around this star, and for transits of disintegrating planetesimals— 
is consistent with a scenario, suggested over the past decade, in which 
the orbits of rocky bodies are occasionally perturbed and pass close 
enough to white dwarfs to become tidally disrupted, leading to the 
infall of debris onto the star’s surface. Observations have shown that 
this scenario could be quite common among white dwarfs, with 
between 25% and 50% of white dwarfs showing evidence of heavy- 
element pollution. Our observations indicate that disintegrating 
planetesimals may be common as well (Supplementary Fig. 9). The 
transits of WD 1145+017 provide evidence of rocky, disintegrating 
bodies around a white dwarf, and support the planetesimal accretion 
model for the pollution of white dwarfs. 
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The rise of fully turbulent flow 


Dwight Barkley', Baofang Song***, Vasudevan Mukund”’, Grégoire Lemoult”, Marc Avila’ & Bjorn Hof* 


Over a century of research into the origin of turbulence in wall- 
bounded shear flows has resulted in a puzzling picture in which 
turbulence appears in a variety of different states competing with 
laminar background flow’ *. At moderate flow speeds, turbulence 
is confined to localized patches; it is only at higher speeds that the 
entire flow becomes turbulent. The origin of the different states 
encountered during this transition, the front dynamics of the tur- 
bulent regions and the transformation to full turbulence have yet 
to be explained. By combining experiments, theory and computer 
simulations, here we uncover a bifurcation scenario that explains 
the transformation to fully turbulent pipe flow and describe the 
front dynamics of the different states encountered in the process. 
Key to resolving this problem is the interpretation of the flow as a 
bistable system with nonlinear propagation (advection) of tur- 
bulent fronts. These findings bridge the gap between our under- 
standing of the onset of turbulence’ and fully turbulent flows*”. 

The sudden appearance of localized turbulent patches in an other- 
wise quiescent flow was first observed by Osborne Reynolds for pipe 
flow' and has since been found to be the starting point of turbulence in 
most shear flows**!°"*. In this regime of localized turbulence it is 
impossible to maintain turbulence over extended regions as it auto- 
matically'*”” reduces to discrete patches, each of approximately the 
same size. Such patches are called puffs in the context of pipe flow (see 
Fig. la). Puffs can decay, or else split and thereby multiply. For 
Reynolds numbers (dimensionless flow rates) R > 2,040, the splitting 
process outweighs decay, resulting in sustained disordered motion’. 
Although sustained, this turbulence appears only as discrete puffs 
surrounded by laminar flow (Fig. 1a), and larger clusters of turbulence 
cannot form’””’. 

At flow rates larger than those sustaining the regime of localized 
turbulent patches, the situation is fundamentally different: once trig- 
gered, turbulence aggressively expands and eliminates all laminar 
motion (Fig. 1b). The flow is then fully turbulent and only in this state 
do wall-bounded shear flows have characteristic mean properties such 
as the Blasius or Prandtl-von Karman friction laws’. This rise of fully 
turbulent flow has remained unexplained, despite the fact that this 
transformation occurs in virtually all shear flows and generally dom- 
inates the dynamics at sufficiently large Reynolds numbers. 

A classic diagnostic for the formation of turbulence”*’”® is the 
propagation speed of the upstream and downstream fronts of a tur- 
bulent patch. We carried out such measurements for pipe and square- 
duct flow (Fig. 1c), focusing on the regime where turbulence first 
begins to expand. In both experiments, fluid enters the conduit 
through a smoothly contracting inlet, which ensures that, without 
external perturbations, flows are laminar over the Reynolds number 
range shown in Fig. 1. Turbulence is triggered 120d from the inlet 
(where d is the pipe diameter; see Methods) by a short-duration, loca- 
lized perturbation. A pressure sensor at the outlet determines the 
subsequent arrival of first the downstream and then the upstream 
turbulent-laminar front. Speeds are averaged over many realizations 
for each R, corresponding to a total travel distance of typically 
5 X 10*d. As an independent verification, speeds in pipe flow were 


determined from direct numerical simulations in pipes of length 
180d, with averaging over typically 20 runs. 

In both pipe and square-duct flows, initially the speeds of the down- 
stream fronts are indistinguishable from the upstream ones, signalling 
localized turbulence. For R = 2,250 in pipe flow and R = 2,030 in duct 
flow, the downstream speed increases with R; these values mark the 
point where turbulence begins to aggressively invade the surrounding 
fluid. With further increases in R, the downstream front speeds exhibit 
complex changes of curvature as a function of R. The spreading of 
turbulence shows neither a square-root scaling nor an exponent assoc- 
iated with a percolation-type process, as proposed in earlier studies*”’; 
the speed of the downstream spreading exhibits far more complex 
behaviour than these theories imply. 

In a previous theoretical approach”, puffs in pipe flow were cate- 
gorized as localized excitations, analogous to action potentials in 
axons, from which the numerous features of puff turbulence were 
captured. However, in that model, the transition leading to an expand- 
ing state is first-order (discontinuous), which does not reflect the 
observed continuous behaviour at the onset of fully turbulent flow 
(Fig. 1c). Moreover, this model did not include nonlinear advection, 
a feature intrinsic to fluid dynamics. We have devised an extended 
model incorporating an advective nonlinearity that enables us to fully 
capture the sequence encountered in the transformation to fully tur- 
bulent flow. The model is 


qt (u—0)qx =f(q, u) + Dax, 


uy + uy = €g(q, u) (1) 


where 


f(q.u)=q(r+u—2—(r+0.1)(q—1)), 


g(q, uw) =2—u+2q(1—u) 


and the subscripts denote partial derivatives. The variables q and u 
depend only on the streamwise coordinate x and time t. q denotes the 
turbulence level within the flow, which is physically representative of a 
cross-sectional integration of the turbulent fluctuations. u represents 
the centreline velocity of the fluid and plays two important roles: it 
accounts for nonlinear advection in the streamwise direction and cap- 
tures the physical state of the shear profile, with u = 2 corresponding 
to parabolic flow and u<2 to plug flow. The functions f(q, u) and 
g(q,u) describe, with minimal nonlinearities, the known interplay 
between turbulence (the excited state) and the shear profile*'®”*. An 
explicit derivation of these functions from the Navier-Stokes equa- 
tions has yet to be achieved. The parameter r models the Reynolds 
number, ¢ accounts for the fact that turbulence is advected more slowly 
than the centreline velocity, D controls the coupling strength of the 
turbulent patches to the laminar flow (via diffusion) and € sets the 
timescale ratio between the fast excitation of q and the slow recovery of 
u following relaminarization; see Methods for details. 

To elucidate the core of the transition from localized to expanding 
excitations, and to identify the different states occurring in the process, 
we carry out a standard asymptotic analysis**”® in the limit of sharp 
laminar-turbulent fronts (¢ — 0). Three distinct turbulent structures 
are predicted: a localized state (Fig. 2a), an asymmetric expanding state 


(2) 
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Figure 1 | Localized and fully turbulent flow. 

a, b, Numerical simulations of pipe flow illustrate 
the distinction between localized turbulence at 

R= 2,200 (a) and fully turbulent flow at R = 5,000 
(b). In each case, the flow is initially seeded with 
localized turbulent patches and the subsequent 
evolution is visualized via space-time plots in a 
reference frame co-moving with the structures. 
Colours indicate the value of \/u? + us» where 

u = (u,, Ug, Ux) is the velocity vector at the given 
point, expressed in cylindrical coordinates. Cross- 
sections of instantaneous flow within the pipe are 
shown above the plots to further highlight the 
distinction between the two regimes; a 35d section 
is shown with the vertical direction (the pipe cross- 
section) stretched by a factor of two. Although the 
protocol used here is seeding the flow with localized 
patches of turbulence, the fundamental distinction 
between localized and fully turbulent flow is 
independent of how turbulence is triggered’*. 
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(Fig. 2b) and a symmetric expanding state (Fig. 2c). The essence of 
each state is seen in the local phase plane (Fig. 2d-f). Equilibrium 
points are located at the intersections of the q and u nullclines (curves 
where time derivatives of u and q are zero). For low values of r (Fig. 2d) 
the only equilibrium is (u=2, q=0), corresponding to parabolic 
laminar flow. Nevertheless, the system can be excited locally; when 
perturbed, the state jumps to the upper branch q*. This forms the 
upstream laminar-to-turbulent front. On the upper branch, u<0 
(where the overdot indicates differentiation with respect to time) 
and u decreases to a point where turbulence is not maintained and 
the system jumps back to q = 0, forming the downstream front. The 
downstream front follows the upstream one at a fixed distance, thus 
creating a localized excitation: a puff in pipe flow analogous to an 
action potential in excitable media**”®. 

For larger values of r, a second stable equilibrium appears (upper- 
most intersection of the nullclines in Fig. 2e, f) and the system is now 
bistable. Here, fully turbulent flow begins to arise. The downstream 
front lags the upstream front, giving rise to a growing turbulent region 
between the fronts. Initially the expansion is asymmetric and the 
spreading rate is modest (Fig. 2b, e); the fronts themselves are not very 
different in appearance from those of the localized state. The down- 
stream front occurs at u < 2 and is formed by a drop directly from the 
upper equilibrium q = q* to q = 0 (Fig. 2e). We refer to this as the 
‘weak front state’. For larger r the weak front becomes unstable, giving 
rise to the final state, a much more rapidly expanding ‘strong front 
state’ (Fig. 2c, f). The strong downstream front occurs at u = 2 and is 
the mirror image of the upstream front. As seen in Fig. 2c, the value of 
increases above q* just before the drop to q = 0 at the downstream 
front. The downstream speed is opposite to the upstream speed with 
respect to what we term the ‘neutral speed’. 

Before comparing the model to the experimental data, we discuss 
features of the front-speed scaling that are intrinsic to this model. 
Figure 2g shows front speeds of the three states. (From the asymptotic 
analysis presented in the Methods, the front speeds explicitly scale as 
VD; the results in Fig. 2 are for D = 0.13.) Starting at low r, excitations 


1 | 
4,000 5,000 


Reynolds number 


are strictly localized and their speed monotonically decreases with r 
(red curve in Fig. 2g). Expanding turbulence is first encountered when 
this curve intersects the weak-front curve (green in Fig. 2g). The tur- 
bulent state (upper fixed point in Fig. 2e, f) bifurcates at lower r, but 
initially the downstream speed is smaller than the upstream one, 
resulting in a contraction back to a localized excitation. Thus onset 
of bistability and the expansion do not coincide, masking the transition 
and resulting in a non-standard front speed scaling (in contrast to the 
case without nonlinear advection shown in Extended Data Fig. 1a). 
The strong front (blue in Fig. 2g) is stable at slightly higher r (solid 
portion of the curve) and is perfectly symmetric to the downstream 
front (red in Fig. 2g) about the neutral speed. In the asymptotic limit 
(€ > 0), weak and strong fronts co-exist over a range of r, but for finite 
€ the front speed continuously varies from a weak to increasingly 
strong front (solid black curve in Fig. 2g). During this adjustment 
the front speed exhibits two curvature changes. This, together with 
the eventual approach to the upper branch of the parabola, is a distinct 
signature of the scenario described by this model. 

Using the theoretical model as a guide, we combine the measured 
front speeds from pipe and duct flow and compare them directly with 
theory (Fig. 3a). Initially, at lower values of R, turbulent excitations are 
localized (as illustrated for duct flow in Fig. 3b and pipe flow in Fig. 3e) 
and the front speed data from both flows agree very well with the 
parabolic scaling predicted by the model asymptotics (solid red curve 
in Fig. 3a). At R~ 2,250 in pipe flow and R ~ 2,030 in duct flow, 
expansion begins with the formation of the weak downstream front 
(illustrated for duct and pipe flow in Fig. 3c, f, respectively). Although 
upstream fronts of both data sets continue to follow the simple asymp- 
totic form, the weak downstream fronts do not display the same 
scaling. Nevertheless, with appropriate choices of the parameters 
and ¢, the model precisely captures the two curvature changes (solid 
black curves in Fig. 3a) encountered as each flow continuously adjusts 
from the weak front (green dashed line in Fig. 3a) to the strong front 
(blue dashed line in Fig. 3a), corresponding to the emergence of the 
final strong front state (Fig. 3d, g). As the downstream front 
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Figure 2 | Model predictions in the asymptotic limit of sharp laminar- 
turbulent fronts. a-c, Three distinct types of predicted states. Cy, and Caown 
are the upstream and downstream front speeds. d-f, Corresponding states 
viewed in the local phase plane, with arrows indicating increasing x (not time). 
The q nullcline (f(q, u) = 0; cyan, for clarity labelled only in f) has three 
branches because f(q, u) is cubic in q: two stable branches, laminar q° = 0 and 
the upper q = q" branch, and an unstable branch q = q__ separating the two 
stable branches. The u nullcline (g(q, u) = 0; magenta) describes the decrease in 
the centreline velocity in the presence of turbulence and its recovery in the 
absence of turbulence. Fronts are formed when the system jumps between 
stable branches of the q nullcline. In all cases, the upstream front is a transition 
from laminar flow (the equilibrium at u = 2, q = 0, indicated by a filled black 
circle) to the upper branch q = q". (These fronts are shown as red lines 

with up arrows in d-f; the corresponding speeds are indicated by red arrows in 
a-c.) The cases are distinguished by the downstream front. In a and d (r = 0.5), 
the system is excitable and the downstream transition, from q* to q° (red 
line with down arrow in d), is unrestricted by the upper branch and the speed 
(also red in a) will be selected to match the upstream front Cgown = Cup, yielding 
approaches the scaling given by the strong-front asymptotics, its speed 
forms a parabola with the upstream front speed, a feature overlooked 
in previous studies. 

Weak fronts move more slowly than the bulk advection velocity 
of turbulence; once the downstream front speed exceeds the bulk 
advection velocity, the front switches to a strong front. At that 
point, a turbulent patch invades (nearly) fully recovered laminar 
flow at the downstream front, in much the same way that turbulence 
invades fully recovered laminar flow at the upstream front. This 
produces the symmetry between the upstream and strong down- 
stream fronts. 

There are two features of pipe and duct turbulence that the model 
does not capture. Both originate from stochastic fluctuations within 
turbulence and are most prevalent when turbulence first begins to 
expand (Fig. 3c, f). Fronts fluctuate, especially the downstream front, 
and it is common for the system to sometimes exhibit a strong and 
sometimes a weak downstream front. The bifurcation scenario pre- 
dicted by the model is only recovered in average quantities. Likewise, 
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localized turbulence. In b and e (r = 1.2), the system has become bistable 
with the formation of an upper-branch steady state (indicated by the second 
filled black circle). There is also an unstable fixed point, indicated by the 
open circle. Evolution on the upper branch is restricted by this state so the 
downstream front speed may no longer be able to match the upstream front 
speed: in general Cgown ~ Cup. Consequently, the turbulent patch expands. In 
cand f(r = 1.8), the upstream and downstream fronts have the same character, 
but move in opposite directions, Caown = —Cuyp, in a reference frame moving 
at the neutral speed. We refer to the downstream fronts in b and e as ‘weak 
fronts’ (shown in green) and those in c and f as ‘strong fronts’ (shown in blue). 
g, Front speeds as a function of model Reynolds number r. Upstream and 
localized downstream speeds (red), weak front speeds (green), and strong front 
speeds (blue) are from equations (10) and (11) in Methods, with solid lines 
indicating stable fronts as ¢ > 0. The nominal critical point for the onset of fully 
turbulent flow is masked. The neutral speed is the speed about which the 
upstream and strong downstream front speeds are symmetric. At finite ¢, the 
transition from weak to strong scaling is continuous (black curve). 


turbulence for 2,250 < R < 3,000 is not always uniform, but com- 
monly contains intermittent laminar pockets'®”’. 

The simplicity of the model permits investigation of new phenom- 
ena associated with fully turbulent flow. In the model the creation of 
extended turbulent regions hinges on the upper intersection of the q 
and u nullclines, and by manipulating u this fixed point can be 
destroyed (see Methods). Likewise for pipe flow an analogous profile 
manipulation leads to a reverse transition. As demonstrated in the 
Methods, fully turbulent flow is eliminated and only localized excita- 
tions remain, offering a very simple and robust way to control tur- 
bulence and to reduce frictional drag. 

Although much progress has been made in our understanding of 
how turbulence in wall-bounded flows is formed from unstable invari- 
ant solutions** °° at moderate R, little to no progress has been made in 
connecting this transitional regime to studies of high-R turbulence. 
Explaining the origin of the fully turbulent state is a decisive step 
towards connecting these regimes and paves the way for a bottom- 
up approach to turbulence. 
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Figure 3 | The rise of fully turbulent flow. a, Front speeds as a function of 
Reynolds number for pipe and duct flow. Points are experimental results from 
Fig. 1c. Red, blue and dark green curves are front speeds in the asymptotic 
limit of sharp fronts in q (as in Fig. 2g). The only model parameter used to 
fit these curves is D = 0.13. Black curves are the downstream front speed at 
finite front width (€ = 0.2, € = 0.79 for pipe flow and € = 0.11, € = 0.56 for duct 
flow). The distinct weak and strong asymptotic branches (dashed) form the 
skeleton for the formation of fully turbulent flow, while at finite front width 
the model captures the complex behaviour of front speeds as a smooth 
switching between the asymptotic branches. b-d, Cross-stream velocity 
fluctuations v’/U for the three front states in a square duct: localized puff 

(R = 1,700), expanding turbulence with a weak downstream front (R = 2,300) 
and the strong front state (R = 3,000), which exhibits the characteristic energy 
overshoot at the downstream edge” (the arrows to a indicate the Reynolds 
number to which b-d correspond). e-g, Space-time plots from simulations of 
pipe flow at R = 2,000, R = 2,800 and R = 4,500, respectively (as indicated by 
the arrows to a). \/u2 + uj is plotted in the reference frame moving at the 
neutral speed. White lines indicate front speeds from the model converted to 
physical units. At R = 2,000, turbulence is localized with equal upstream and 
downstream front speeds. At R = 4,500, turbulence expands with a strong 
downstream front and the long-time flow is fully turbulent. The upstream and 
downstream fronts have the same character (compare with the symmetric 
overshoot in Fig. 2d) and the spreading is symmetric in the neutral reference 
frame. At R = 2,800, the downstream front moves at a speed between the weak 
and strong branches and exhibits some characteristics of both fronts as 

it fluctuates. This, as well as the intermittent laminar patches appearing within 
the turbulent flow, is typical of turbulence as fully turbulent flow first arises. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Speed measurements. Speeds of laminar-turbulent fronts were measured in 
experiments and highly resolved computer simulations. In both cases, long obser- 
vation times were necessary to average out stochastic fluctuations that, although 
intrinsic to turbulence, may disguise the underlying transition scenario. All mea- 
sured speeds are nondimensionalized by the mean streamwise velocity U. Times 
are reported in units of d/U for pipe flow and h/U for duct flow, where d is the pipe 
diameter and h is the duct width. The corresponding Reynolds numbers for the 
two flows are R = Ud/v and R = Uh/v, where v is the kinematic viscosity. 

Pipe experiments. Experiments were carried out in a pipe with a diameter d = 10 
mm (+0.01 mm) and a length of 1,500d. The 15-m-long pipe was assembled on a 
straight aluminium base and made of precision bore glass tubes with lengths of 
1-1.2 m. Customized connectors made from perspex allowed an accurate fit of the 
pipe segments. A specially made pipe inlet consisting of several meshes and a 
smooth convergence from a 100-mm-wide section to the 10-mm pipe was used 
to avoid inlet disturbances and eddie formation (see ref. 17 for details). In this way, 
the water flow could be held laminar for R > 8,000. 

The laminar flow was left to develop its parabolic velocity profile over a length of 
200d. At this downstream location, the flow was perturbed by an impulsive jet of 
water injected (for 10 ms) through a 1-mm hole in the pipe wall. The perturbed 
flow was left to develop into a turbulent patch over the next 250d and at this 
location (450d from the inlet), a pressure sensor recorded the arrival of the 
upstream and downstream laminar-turbulent interfaces. A second sensor was 
located a further 1,000d downstream (50d upstream of the pipe exit), once again 
determining the arrival of the interfaces so that the average interface speed over the 
intermediate stretch of 1,000d was measured. At each Reynolds number, the 
measurement of the interface velocity was repeated 10 times. 

The flow was gravity driven from a reservoir at a fixed height above the pipe exit. 

Because the turbulent fraction in the pipe is increasing over the course of a 
measurement, the overall drag in the pipe also increases (turbulent flow has a 
higher skin friction than does laminar flow). This unavoidably leads to a drop in 
the flow rate (and hence R) during a measurement. To minimize this effect, a large 
reservoir height was chosen; in this case 23 m above the pipe exit. A precision valve 
positioned directly in front of the pipe inlet was used to adjust the flowrate and 
hence to select R. For the Reynolds-number regime investigated here (R < 6,000), 
the total pressure drop across the pipe is much smaller than the 23-m water head, 
and most of the pressure drop occurs across the valve. The increase in drag caused 
by the expansion of turbulence is only a small fraction (<0.5% of the overall 
pressure drop) and hence, even at the highest Reynolds numbers, investigated 
flow rates were constant to within <0.5% throughout the measurement. 
Duct experiments. Experiments were carried out in a square duct with width 
h=5 mm and a length of 1,200h (6m). The duct was made of eight perspex 
sections precisely machined to an accuracy of + 0.01 mm. They were assembled 
and mounted straight together on an aluminium frame. A well-designed entrance 
section consisting of a honeycomb and a convergent section, with an area ratio of 
25, allowed the flow to remain laminar up to at least R = 5,000. 

The flow was gravity driven from a reservoir at a fixed height and water was used 
as the working fluid. Analogous to the pipe experiment, a precision valve was 
positioned directly in front of the duct and was used to set the flowrate. The 
pressure drop across the valve was considerably larger than that across the pipe. 
The temperature of the water was controlled by means of a heat exchanger that the 
water had to pass before entering the pipe. Overall, an accuracy in R of better than 
0.5% was achieved for the investigated Reynolds number regime (R < 6,000). 

The flow was perturbed by injecting water through a 0.5-mm hole drilled in one 
wall of the duct, 120h downstream from the inlet. The duration of the perturbation 
(t) was varied with R so that in dimensionless units it corresponded to 5 (t/h/U). 
The evolution of the perturbation was then monitored at five locations where the 
pressure was recorded. The pressure sensors were positioned at 100h, 400h, 600h, 
800h and 1,000h downstream of the perturbation point. Sensors measured the 
pressure difference over 10h along the duct. The arrival times of both interfaces 
were detected at each location and the overall speeds were determined by a linear 
fit. For each R, we averaged the measurement over at least 50 realizations. 
Numerical simulations. We consider the motion of incompressible fluid driven 
through a circular pipe with a fixed mass flux. Normalizing lengths with the diameter 
d and velocities with the mean velocity U, the Navier-Stokes equations read 
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where wis the velocity of the fluid and p is the pressure. These equations were solved 
in cylindrical coordinates (r, 0, x) using a code developed by A. P. Willis*!, which uses 
a spectral finite-difference method with no-slip boundary conditions at the pipe wall, 
u(1/2, 0, x, t) = 0 and periodicity in the axial direction. The pressure term was elimi- 
nated from the equations by using a toroidal-poloidal potential formulation of the 


velocity field, in which the velocity is represented by toroidal y and poloidal poten- 
tials ¢, such that u=V x (Wx) + V x V x (#X). 

After projecting the curl and double curl of the Navier-Stokes equations onto 
the x axis, a set of equations for the potentials y and ¢ is obtained. A difficulty, due 
to the coupled boundary conditions on the potentials, is solved with an influence- 
matrix method. In the radial direction, spatial discretization is performed using a 
finite-difference method with a 9-point stencil. Assuming periodicity in azimuthal 
and axial directions, the potentials are expanded in Fourier modes 
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where ak and m are the wavenumbers of the modes in the axial and azimuthal 
directions respectively, 2n/a fixes the pipe length L,, and Aj.» is the complex 
Fourier coefficient of mode (k,m). The time-dependent equations are inte- 
grated in time using a second-order predictor—corrector scheme with a dynamic 
timestep size, which is controlled using information from a Crank-Nicolson 
corrector step. The nonlinear term is evaluated using a pseudo-spectral tech- 
nique with the de-aliasing 3-rule. Using the expansion in equation (3), the 
resultant linear differential equations for the potentials yy and ¢ decouple for 
each (k,m) mode. This linear system is solved using LU decompositions of 
the resultant banded matrices; see ref. 31 for more details of the formulation 
and solution. 

Initial conditions were prepared at R = 2,000 in 133d and 180d pipes for simu- 
lations at R> 2,000. At low R ~ 2,000, puff-splitting is extremely unlikely’ and 
puffs remain approximately constant in length (about 20d) as they travel down- 
stream along the pipe. Hence, simulations at R = 1,910, 1,920, 2,000 were carried 
out in a shorter 24m ~ 75d pipe, with initial conditions prepared at R = 1,950. The 
lengths of the pipes and numerical resolutions used at each Reynolds number are 
listed in Extended Data Table 1. 

The fronts were detected by setting an appropriate cut-off. Here, the local 
intensity was computed as 
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and a cut-off of 5 X 10° * was chosen for all the simulations to determine the 
position of the laminar-turbulent fronts. We tested different cut-off values and 
found that the front speed was insensitive to the chosen value. 

The expansion speed of the downstream front was found to accelerate substan- 
tially during the initial stages of the simulation. To obtain the asymptotic value of 
the speed, we determined the length of the turbulent region Lo beyond which the 
speed statistics become length-independent. We found that for R< 4,000, 
Ly > 60d was sufficient, whereas for R = 4,000, Lo > 100d was required. This is 
the reason why very long pipes were used, as reported in Extended Data Table 1. At 
each R, the speed was determined by computing (Xena — Xo)/(tena — to) for each 
run and then averaging over a total of 20 runs. The initial time fo corresponds to 
the time at which the turbulent region has reached the length Lo. 

Model details. The model is a two-component system of advection-reaction- 
diffusion equations 
oq du ou 
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where q represents the level of turbulent fluctuations and u the axial velocity at the 
centreline. The nonlinear reaction functions f(q, u) and g(q, u) are 


fq w=qr+u—2-(r+0.1)g—1)),  g(q,u)=2—ut2q(1—u) 


where the parameter r corresponds to the model Reynolds number. 

The model and the role of the fitting parameters (D, { and ¢) are most easily 
understood by first considering the equations in the absence of spatial derivatives. 
In this case, the model reduces to the ordinary differential equations (ODEs) 


4=f(q, 4), 


where the overdot indicates differentiation with respect to time.These ODEs are 
the core of the model as they describe the interaction between the turbulent 
fluctuations q and axial velocity u locally in space. The functional forms are 
designed to qualitatively capture the well-established physics of this interaction” 
with minimal nonlinearities. (In a previous approach”, the variable u corre- 
sponded to the axial velocity of pipe flow in the frame of reference moving at 
the mean or bulk velocity U; here, u corresponds to velocity in the lab frame so that 
u = 2 for laminar flow.) 

The nullclines for the ODEs are f(q, uv) = 0 and g(q, u) = 0. For all parameter 
values, these nullclines intersect at the fixed point (u = 2, q = 0) corresponding to 
laminar, Hagen-Poiseuille flow. € sets the ratio of the timescale of u relative to q. 


u=eg(q, u) 
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(Previously, two parameters €; and € appeared in the model; here we have 
simplified the model to a single timescale ratio ¢, where €; = € and €, = 2c.) 

Now consider the full model equations. In addition to the local terms given by 
f(q,u) and g(q, u), the model has first and second spatial derivatives. The first- 
derivative terms account for nonlinear advection in the streamwise direction. For 
the u equation, we use the advective nonlinearity that follows directly from the 
Navier-Stokes equations. The parameter ¢ accounts for diminished advection of q 
in comparison with the centreline velocity u. The streamwise velocity is maximal 
on the centreline and the turbulent field is not advected at this speed. We simulated 
turbulent flow in short pipes (L = 12d) and verified that turbulent structures are 
advected considerably more slowly than is the centreline velocity. This effect leads 
to complex processes in the pipe cross-section. We include in the model the 
simplest term that can describe the diminished advection. (Previously”, the model 
contained only linear advection; the fixed difference in the advection of the q and u 
fields was expressed by an additional first-derivative term on the right-hand side of 
the u equation, which effectively corresponded to ¢ = 1 in the current model.) We 
describe the importance of the parameter ¢ after we derive expressions for front 
speeds in the model. 

The diffusive term in equation (4) accounts for the processes by which a region of 
turbulent flow couples to, and thereby excites, adjacent laminar flow. The physical 
processes involved are complex and not fully understood*”!*°****, However, the 
second-derivative is the most natural choice for modelling such a coupling. The 
coupling strength or diffusion coefficient D is the final model parameter. 
Asymptotic analysis. The asymptotic analysis follows very closely that of ref. 26. 
Let the three roots of f(q, u) be denoted q°, q~. The laminar branch is q° = 0 for all 
u and r, whereas the upper and lower branches q~ are functions of u and r. The 
laminar q° and upper q* branches are stable. For small ¢, the dynamics of 
the system separate into slow regions and fast front regions. In the slow regions, 
the system is ‘slaved’ to one of the stable branches (slow manifolds) and u evolves 
ona slow scale; for example, along the upper branch q* 


g(q" (u), 4) 


f ou ou 

q=4 (u), ai tus, 

where x’ = ex and f' = ct are the slow scales. 

In the fast regions, fronts in q are formed as the system transitions between the 

stable branches: from q° to q* as x increases for an upstream front and from q* to 

q for a downstream front. Let c denote the speed of the front and consider a frame 

of reference moving at speed c. We set the location of the now stationary front at 

x = 0 and work in an inner (stretched) variable x/ VD. To leading order in ¢, the 
equations in the stretched coordinate become 


q’ +sq'+f(q, u)=0 (5) 
u'=0 (6) 
where 
c—(up —€) 


s= 


VD 
and the prime denotes differentiation with respect to the stretched variable. From 
equation (6), u is constant to leading order across a front; we denote this constant 
value u. Equation (5) must be solved subject to boundary conditions, which for a 
downstream front are 


q(—2)=q* (ur), g(-+0)=q° (7) 


For an upstream front, the boundary conditions are reversed, but this can be 
accounted for by a change of sign of s in equation (5). These inner solutions 
determine the shape of the front. In the original length scale x, the front thickness 


scales as VD. 
In summary, the speed of a front at a given value of u = ur is found by solving 


q’ +sq' +f(q, ur) =0 (8) 


subject to the boundary conditions in equation (7). This gives a value of s that is a 
function of both u¢and r, and which we denote by s(u, r). From this the front speed is 


c=up—C+VDs(uy, 1) (9) 


with + for a downstream front and — for an upstream front. For the strong down- 
stream front and all upstream fronts, ur = 2. Hence their speeds are 


c=2—C+VDs(2,r) (10) 


For the weak downstream front, u¢= u,, where u,, is the upper-branch steady state. 
Hence 


c= ss —C+ VD S(Ugs,r) (11) 
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Extended Data Figure 1 shows model front speeds as a function of model Reynolds 
number (Extended Data Fig. 1b is the same as Fig. 2g); speeds are from equations (10) 
and (11). 

The neutral speed in the model is c=2 — ¢. This follows immediately from 
equation (10) where one can see that the upstream speed (minus sign) and strong 
downstream speed (plus sign) are symmetric with respect to 2 — ¢. This is the 
advection speed of turbulence in the absence of front dynamics due to transitions 
between laminar and turbulent flow. Without the parameter C, the neutral speed 
would be the maximum centreline velocity. This is neither consistent with the 
observed neutral speed, nor is it reasonable that turbulent structures would be 
advected at the maximum speed found in the flow. 

Extended Data Figure 1a shows front speeds without the inclusion of advection 
terms (first derivatives in x) in the model equations. Without these terms the front 
speeds become 


c= +VDs(2,r) 


for the strong downstream front and all upstream fronts, and 
c= VD S(Uss5 r) 


for the weak downstream front. The transition to expanding turbulence is discon- 
tinuous. Including linear advection (as was done previously”) will result in an 
overall shift in all front speeds, and can affect the asymptotic stability of branches, 
but will not change the discontinuous nature of the transition. 

This highlights the role of nonlinear advection in the bifurcation scenario: 
without the physical effect of nonlinear advection, the weak front branch has a 
distinct critical point and the transition to expanding turbulence is first-order 
(discontinuous). 

In Fig. 2a-c, solutions q(x) are obtained from the full model equations (4) with 
€ = 0.002, which is sufficiently small that these solutions are visually close approx- 
imations to the € — 0 limit. Figure 2d-f, shows the nullclines for the cases shown in 
Fig. 2a-2c; however, the trajectories in the phase portraits are sketches (with the 
fronts coloured for clarity): even at this small ¢, the jumps between the branches of 
q are not completely vertical in the phase plane. 

A further calculation determines the stability of the asymptotic branches (D.B., 
manuscript in preparation). The result is that the weak downstream front is stable 
in the asymptotic limit (¢ = 0) if c< ug = Ugg, whereas the strong downstream 
front is stable in the asymptotic limit if c > up = 2. These criteria determine the 
stable portions of the branches (plotted as solid) in Fig. 2g. 

There are many documented exact coherent structures in pipe flow. Most of 
these are spatially extended, in the form of travelling waves**°**, but spatially 
localized states have also been found****. The model captures these states in a 
minimal way. The fixed points q~ (one stable and one unstable) arising as the 
model transitions to bistability can be viewed as upper and lower branches of 
spatially extended traveling-wave solutions. The cubic nonlinearity in f(q, u) is 
the minimum requirement for this separation into upper and lower branch states. 
The model also has localized states (puffs) and, importantly, unstable small-ampli- 
tude localized solutions (not discussed here; see refs 22, 25, 26) corresponding to 
edge states, both in the puff regime and in the fully turbulent regime. 

Finally, we comment on what takes place at the critical point where the system 
first becomes bistable. As with all material in this section, the discussion follows 
closely refs 25, 26. Extended Data Figure 2 illustrates solutions to the boundary 
value problem in equation (8) in the case of a downstream front. For a fixed value 
of r, the eigenvalue s and solution q depend on us, as do the boundary conditions in 
equation (7). 

Downstream fronts are heteroclinic connections from q* to q°, where ‘time’ in 
the phase plane corresponds to space (in the reference frame co-moving at the 
front speed). The phase plane is two-dimensional, coordinates q and q’, because 
equation (8) is the second-order ODE. As illustrated in Extended Data Fig. 2c, for 
generic up, both q* and q° are hyperbolic fixed points (saddles) in the phase plane 
and a heteroclinic connection exists only for a unique value of s. This determines s 
as a function of us as shown by the bold curve in Extended Data Fig. 2a. However, 
when uf is such that q’ = q_ (at the nose of the q nullcline), the upper fixed point 
(q=q* =q_) isno longer hyperbolic and there exist infinitely many heteroclinic 
connections from q* = q_ to q°, and hence infinitely many possible values of s. 
These appear as the thin line in Extended Data Fig. 2a. 

As the parameter r is varied (as in Fig. 2), the nullclines vary. The critical point is 
where the upper-branch steady state occurs at the limit point of the q nullcline, that 
is the upper fixed pointis at q* = q_. For rsmaller than this value, the downstream 
front speed can take any of an infinite range of values because the downstream 
front occurs at ug = u,. For a puff, this infinite range of possible values is the 
mechanism that allows the speed of the downstream front to select the same value 
as for the upstream front. As a result, puffs remain localized while travelling along 
the pipe. However, for r larger than the critical value, the upper-branch fixed point 
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no longer permits downstream fronts to occur at up = u,, as seen in Fig. 2e. This 
restricts the possible values of s and, hence, the possible speeds of the downstream 
front to the bold portion of the branch illustrated in Extended Data Fig. 2a. Hence, 
as r passes through the critical point there is an abrupt change in the allowed values 
of the downstream front speeds, from an infinite to a finite range. Without non- 
linear advection, the abrupt change is manifested as a discontinuous change in the 
speed of the downstream front. With nonlinear advection, there is still a discon- 
tinuous change to the allowed values of s, but the speeds are smaller than those of 
the upstream front, so the discontinuity in allowed solutions is masked. 
Combining pipe and duct data. To combine data from pipe and duct flow into a 
single plot, it is necessary to determine specific Reynolds numbers and speeds from 
measured data (see Fig. 3), which will then be used to align the data from the two 
flows. Although the procedure is informed from the model analysis, it requires 
only measured data and the same procedure could be applied to data from other 
shear flows. 

Extended Data Figure 3a, c shows data from pipe and duct flow, respectively, 
plotted with the upstream speeds reflected about the neutral speed, labelled Co. 
The value of Co is determined to be that for which reflected upstream data coin- 
cides with the downstream data at sufficiently large Reynolds number. Extended 
Data Figure 3b, d shows the same data, but with model speeds (determined sub- 
sequently) also plotted as a visual aid. In the case of pipe flow, it is possible to 
determine Cp to better that 2% accuracy. For duct flow, we estimate that the 
downstream front speed has not quite reached the reflected upstream speed at 
the highest Reynolds number accessible to present experiments. Nevertheless, Co is 
quite well determined. From the same plots, the value of the Reynolds number Ro 
at which the upstream front obtains the neutral speed Cp, is easily determined. 

We also determine the Reynolds number R, from the data, where the down- 
stream weak front first deviates from the downstream front. This can in principle 
be determined solely from the data, but using model fits to the weak branch gives 
further confidence in the determined values. C, is the front speed at Rj. 

Once the values (Ro, Cy) and (Rj, C,) have been found for each flow, the data is 
collated by plotting each data set such that these two points collapse, as seen in 
Extended Data Fig. 4. This is equivalent to simply choosing the origin and scaling 
the axes for the two flows. The upstream and strong-downstream fronts each 
coincide, whereas the weak-front branch does not. 

Determining model parameters. There are three model parameters, D, ¢ and ¢, to 
be determined to quantitatively relate the model speeds to the measured data for 
each flow. 

The generic model cannot be expected to predict the flow-specific values Ro, Ri, 
Cy and C;, and, moreover, there is nothing universal about these values. Instead, 
given these flow-specific values, the model should capture the form of the various 
branches seen in the combined data of Extended Data Fig. 4. When fitting model 
parameters, it is useful to plot the combined data in terms of the reduced Reynolds 
number and reduced speed 


R—Ro 1C—G (12) 
Ri—Ry’ 2Q-C 


which requires only relabelling of the axes in Extended Data Fig. 4 to shift the 
neutral speed Cy to zero and scale the onset of the weak front to the point 
(R,, C,) = (1, —1/2). As will become apparent, the reason for including 1/2 in 
the reduced speed is so that model speeds are typically about half those of the 
reduced speeds from the experimental data. 

We first consider the value of the parameter D. We select D so as to fix a simple 
relationship between model and measured quantities for both flows. Specifically, 
in Extended Data Fig. 5a, we plot the combined pipe and duct data together with 
the asymptotic results from the model for different values of D. The model results 


are plotted directly in terms of the model Reynolds number r and the model speed 
shifted by the model neutral speed, c — co. For D = 0.13 the upstream front and 
strong branches match the combined experimental data extremely well. Note that 
the strong and weak asymptotic curves in Extended Data Fig. 5a are independent 
of the other two model parameters, ¢€ and ¢. 

Using only one parameter, D, and fixing its value to 0.13, the model not only fits 
the upstream and strong-downstream front speeds very well for both flows, but a 
simple relationship between model and experimental data are fixed, namely 


R—Ro 1C—Cy (13) 
Ry —Ro’ 2Cy)—-C) 


Given the flow-specific values Ro, Ri, Cy and C), equation (13) is inverted to obtain 
the Reynolds number R and speed C from the model values r and c; this is how we 
map the model results to Reynolds number and speed in Fig. 3. 

The remaining two model parameters dictate the behaviour of the downstream 
fronts as they transition from weakly expanding to strongly expanding. Here pipe 
and duct flows differ and so the values of the fitting parameters will necessarily be 
different for the two flows; see Extended Data Fig. 5b. 

The value of € dictates how quickly the system jumps from the weak to the 

strong branch. Large values give smoother transitions while smaller values give 
more abrupt transitions. The value of ¢ dictates how long the system follows the 
weak branch before transitioning to the strong branch. Larger values, as for pipe 
flow, result in a delay in transition, whereas smaller values, as for the fit to duct 
flow, result in more immediate transition. We did not apply a formal procedure for 
determining ¢ and ¢ for each of the flows. Rather they were determined simply by 
eye. In both cases it is quite easy to adjust ¢ and € so that the transition from weak to 
strong fronts follows the measured data. 
Control. The model suggests that the fully turbulent state can be destabilized by 
removing the upper turbulent fixed point as depicted in Extended Data Fig. 6a. In 
the model, this is achieved by forcing the variable u, which corresponds to the state 
of the shear profile. The reduction of u by forcing corresponds to a blunting of the 
shear profile. 

To demonstrate that the fully turbulent state can indeed be destabilized by 
removing the turbulent fixed point, as suggested by the model, we performed a 
direct numerical simulation of pipe flow for R = 5,000. Initially the forcing is not 
applied and the flow is fully turbulent. Starting at time t = 175 d/U, a global body 
force is gradually switched on (fully applied by time t = 200 d/U), which blunts the 
velocity profile to a more plug-like form (the same forcing is used as in ref. 18). As 
can be seen, turbulent intensity subsequently decreases, and eventually the fully 
turbulent flow destabilizes and degenerates into localized turbulent patches, sim- 
ilar to the natural ones (puffs) at lower Reynolds number (below about 2,300) in 
the absence of any additional force. 
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Extended Data Figure 1 | Speed of model fronts in the asymptotic limit of 
sharp fronts. a, b, Speeds as a function of model Reynolds number r both 
without (a) and with (b) advection. Although strong downstream fronts 
cannot exist and have no physical meaning below the formation of the upper 
branch fixed point, the expression for strong front speeds in equation (10) 


still gives the speed that such a strong downstream front would have; these 
speeds are shown dashed. The effect of nonlinear advection in b is to mask 


the nominal critical point for the onset of fully turbulent flow. The neutral speed 
is naturally displaced from the mean speed U= 1. 
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Extended Data Figure 2 | Front speeds at critical point. Sketch illustrating 
solutions to the boundary value problem in equation (8) for a downstream 
front near the critical point. a, Eigenvalue s as a function of us. u, is the value 
of ug such that q_ = q’. For this value there are infinitely many possible 
eigenvalues s, indicated by the thin line. b, c, Phase planes (q, q') showing 
solutions for the second order differential equation (8). Downstream fronts 


q q 


are heteroclinic connections from the upper fixed point q* to the lower 

fixed point q°. When us= u, and hence q_ = q", the upper fixed point is not 
hyperbolic and there are infinitely many connections, each corresponding 

to a value of s. When u > u,, q* is hyperbolic and there is a unique connection 
and hence a unique value of s. 
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Extended Data Figure 3 | Determination of corresponding Reynolds simulation data only; b, d, model fits to the experimental and simulation 


numbers and speeds for pipe and duct flow. a—-d, Speeds from pipe (a,b) and data. The determined values for Ro, Ri, Cy and C; are: Ry = 1,920, Cy = 1.06, 
duct flow (c,d) are plotted, as in Fig. 1c, but additionally with the upstream R, = 2,250 and C, = 0.92 for pipe flow, and Ro = 1,490, Co = 1.12, Ry = 2,030 
front speeds reflected about the neutral speed Cp. a, c, Experimental and and C, = 0.90 for duct flow. 
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Extended Data Figure 4 | Combining pipe and duct data. Pipe and duct 
flow are plotted together using different axes. The data are plotted so that 
the two points (Ro, Co) and (Rj, C;) align for each data set; for example, 
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(Rj; Ci) = (2,250, 0.92) for pipe flow is aligned with (Rj, C,) = (2,030, 0.90) for 
duct flow, bringing into alignment the onset of weak fronts. 
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pipe and duct flow 


pipe flow 
¢ = 0.79, € = 0.05, 0.1, 0.2, 0.4 


r R— Ro 
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¢ = 0.56, € = 0.05, 0.1, 0.2, 0.4 


Extended Data Figure 5 | Determination of model parameters for pipe and agreement between the data and the model. This choice of D fixes the 

duct flow. a, Determination of D. Points are data from pipe and duct flow(asin asymptotic branches (dashed curves). b, Determination of { and €. Pipe and 
Extended Data Fig. 4) here plotted in terms of reduced Reynolds number duct flow are necessarily considered separately. In each case, downstream 
(R — Ro)/(Ry — Ro) and reduced speed (C — Co)/[2(Cy — C,)]. Dashed curves _ branches are shown for four values of €. Smaller values yield more abrupt 
are asymptotic speed curves (as in Extended Data Fig. 1) plottedin terms of _ transitions between weak and strong branches. 

model Reynolds number r and speed c — co. For D = 0.13 there is very good 
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Extended Data Figure 6 | Illustration of control by removing the 
turbulent fixed point. a, Control concept illustrated in the model phase plane. 
Without forcing (that is, without control), there is an upper-branch fixed turbulent. A global body force is applied that blunts the velocity profile to a 


point (upper intersection of nullclines) corresponding to fully turbulent flow. more plug-like form. Subsequently, only localized turbulent patches remain, 
Applying an additive forcing term to the u equation corresponds to forcing the reminiscent of those at much lower R. 
shear profile and blunting its shape. This can remove the turbulent fixed 


point thus eliminating fully turbulent flow. b, Proof of concept in a direct 
numerical simulation of pipe flow at R = 5,000. Without forcing the flow is fully 
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Extended Data Table 1 | The domain size and resolution for the simulations at all the Reynolds numbers we considered 


R L,(d) N K M (|R L,(d) N K 


1920 247 48 640 32 3200 =180 72 2560 
2000 247 48 768 40 3500 =: 180 72 2560 
2200 = 133 48 768 40 
2300 =: 133 64 1536 = 40 4000 = 133 72 2048 


2400 =: 133 64 2048 848 


M 
54 
54 
3750 =: 133 72 2048 8654 
54 
4500 180 80 3072 64 

64 


2800 180 72 2560 48 5000 =180 80 3072 


2600 3=133 64 2048 848 5500 =180 96 3840 = 80 


In physical space there are 3K and 3M grid points in axial and azimuthal directions, respectively. N is the number of grid points across the pipe radius d/2. 
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Observation of non-Hermitian degeneracies 
in a chaotic exciton-polariton billiard 


T. Gaol, E. Estrecho!, K.Y. Bliokh'”, T. C. H. Liew’, M. D. Fraser’, S. Brodbeck*, M. Kamp’, C. Schneider*, S. Héfling*®, 
Y. Yamamoto®”, F. Nori?®, Y. S. Kivshar!, A. G. Truscott', R.G. Dall’ & E. A. Ostrovskaya' 


Exciton-polaritons are hybrid light-matter quasiparticles formed 
by strongly interacting photons and excitons (electron-hole pairs) 
in semiconductor microcavities’*. They have emerged as a robust 
solid-state platform for next-generation optoelectronic applica- 
tions as well as for fundamental studies of quantum many-body 
physics. Importantly, exciton-polaritons are a profoundly open 
(that is, non-Hermitian**) quantum system, which requires 
constant pumping of energy and continuously decays, releasing 
coherent radiation®. Thus, the exciton-polaritons always exist in 
a balanced potential landscape of gain and loss. However, the 
inherent non-Hermitian nature of this potential has so far been 
largely ignored in exciton-polariton physics. Here we demonstrate 
that non-Hermiticity dramatically modifies the structure of modes 
and spectral degeneracies in exciton-polariton systems, and, there- 
fore, will affect their quantum transport, localization and dynam- 
ical properties” °. Using a spatially structured optical pump’*’, we 
create a chaotic exciton-polariton billiard—a two-dimensional 
area enclosed by a curved potential barrier. Eigenmodes of this 
billiard exhibit multiple non-Hermitian spectral degeneracies, 
known as exceptional points'*"*. Such points can cause remarkable 
wave phenomena, such as unidirectional transport’*, anomalous 
lasing/absorption’®”” and chiral modes'*. By varying parameters 
of the billiard, we observe crossing and anti-crossing of energy levels 
and reveal the non-trivial topological modal structure exclusive to 
non-Hermitian systems”'*”’. We also observe mode switching and 
a topological Berry phase for a parameter loop encircling the excep- 
tional point”*”*. Our findings pave the way to studies of non- 
Hermitian quantum dynamics of exciton-polaritons, which may 
uncover novel operating principles for polariton-based devices. 
Studies of open quantum systems go back to Gamow’s theory of 
nuclear o-decay developed in the early days of quantum mechanics’. 
Indeed, metastable states of a single quantum particle in a spherically 
symmetric potential well with semi-transparent barriers decay in time, 
and therefore are characterized by complex energies. Furthermore, 
introducing a 2D potential well with non-trivial geometry, that is, a 
quantum billiard, results in strongly correlated energy levels and trans- 
ition to quantum chaos”'??!*>-*. Spectral degeneracies crucially deter- 
mine transport and dynamical properties in both non-Hermitian and 
chaotic wave systems’*'*"'’. In chaotic and disordered wave systems, 
spectral degeneracies underpin statistical properties and quantum 
phase transitions from localized to delocalized dynamics*’. In non- 
Hermitian (including PT-symmetric) systems, non-trivial topology 
of eigenmodes and unusual transport properties in the vicinity of 
exceptional points'*"'”"° are currently under investigation. Basic non- 
Hermitian or stochastic dynamics have so far been studied in the con- 
text of microwave”'**°**, optical’®'7"°7!, atomic’””*”” and electron*>* 
waves. However, the concepts of non-Hermiticity and quantum chaos 
remain largely separated from each other, owing to the lack of a simple 


quantum system in which both features would be readily accessible. 
Moreover, it is challenging to produce artificial complex potentials with 
gain and loss for classical waves, as well as to observe nanoscopic 
electron states in solids. 

Microcavity exciton-polaritons represent a unique quantum macro- 
scopic system, which combines the main advantages of light and mat- 
ter waves'*. Being bosons, exciton-polaritons can display collective 
quantum behaviour, Bose-Einstein condensation (BEC), when they 
occupy a single-particle quantum state in massive numbers. Exciton- 
polaritons have provided a very accessible system for studies of col- 
lective quantum behaviour because they condense at temperatures 
ranging from 10K to room temperature (compared to nanokelvins 
for neutral atoms) and do not require painstaking isolation from 
the environment. 

The schematics of exciton-polariton condensation under continu- 
ous-wave incoherent optical excitation conditions’ are shown in 
Fig. la. The optical pump, far detuned from the exciton resonance 
in the cavity, effectively creates an incoherent reservoir of ‘hot’, 
exciton-like polaritons. Above a threshold density of the reservoir, 
relaxation and stimulated scattering into the coherent BEC state of 
exciton-polaritons dominate the dynamics. The continuously pumped 
condensate decays and releases coherent photons, which escape the 
cavity carrying all information about the condensed state. The inter- 
actions between the reservoir and condensed exciton-polaritons 
are responsible for the formation of effective pump-induced poten- 
tials'°’*. Thus, the macroscopic matter wavefunction is shaped by an 
optical pump and spatially resolved via free-space optical microscopy. 
This enables us to clearly observe and control non-Hermitian and 
irregular quantum dynamics. 

We use a structured optical pump'®” to create a non-Hermitian 
potential in the shape of a Sinai billiard’ with a circular defect of radius 
R (see Fig. 1b) for condensed exciton-polaritons (see Methods for 
details). In our experiment, the billiard has ‘soft’ (inelastic) walls of a 
finite width and height. The main properties of eigenstates of the 
exciton-polariton condensate in the billiard can be described by a linear 
Schrodinger equation with a complex two-dimensional potential 
V(r) = V(r) + iV(r). Here the real part of the potential, V’(r) « P(r), 
is the potential barrier shaped as a Sinai billiard boundary with a 
Gaussian envelope. The optical pump rate, P(r), is induced by the strong 
repulsive interaction between the excitonic reservoir populated by the 
pump and the polariton BEC’®’. The imaginary part of the potential, 
v"(r) x P(r) — y, combines the gain profile produced by the same 
optical pump P(r) with the spatially uniform loss y due to polariton 
decay (Fig. 1b). Despite the strong polariton—polariton interactions, the 
corresponding nonlinearity mostly affects the relative population of the 
energy eigenstates, as well as the overall blueshift (see Methods). 

Changing the radius of the defect, R, varies the geometry of the 
billiard and hence affects the energy levels. Figures 1c and d show the 
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Figure 1 | Non-Hermitian exciton-polariton Sinai billiard and its spectrum. 
a, Exciton-polariton dispersion showing the upper and lower branches (solid 
lines) formed owing to hybridization of the cavity photon and exciton modes 
(dashed lines). The incoherent excitonic reservoir is continuously replenished 
by the optical pump (represented by the cyan arrow) and ‘feeds’ the polariton 
BEC (black arrow). The polariton BEC decays into cavity photoluminescence 
(orange arrow)”. b, Schematics of the exciton-polariton Sinai billiard formed 
in the plane of a quantum well embedded into the microcavity (see Methods). 
The barrier is induced by the optical pump via the excitonic reservoir, and the 
square modulus of the wavefunction of the confined polariton BEC (shown in 


experimentally measured and numerically computed energy spectra 
E(R) of the first 11 levels as a function of R. Variations of the shape of 
the 2D potential tune eigenvalues of different modes at different rates, 
and asa result some energy levels approach each other at certain values 
of R. It can be seen (Fig. 1c, d) that multiple degeneracies (or near- 
degeneracies) appear in the spectrum. In a ‘hard-wall’ Hermitian Sinai 
billiard, the proliferation of degeneracies is a signature of the transition 
from regular to chaotic dynamics’. Although our exciton-polariton 
billiard has ‘soft’ walls and can generically exhibit mixed regular- 
chaotic behaviour”, we clearly observe multiple degeneracies similar 
to the ‘hard-wall’ case’. In Hermitian billiards, the levels generically 
avoid crossing (that is, they anti-cross) in the vicinity of degeneracies, 
which correspond to the average level repulsion and Wigner distri- 
bution of the nearest-neighbour energy spacings’. In contrast, the non- 
Hermitian systems can exhibit both crossings and anti-crossings of 
levels”'"**. This is because the energy eigenvalues in non-Hermitian 
systems are complex: the real part and imaginary parts correspond to 
the real energies and linewidths of the modes, respectively. A crossing 
of the energies is accompanied by an anti-crossing of the linewidths 
and vice versa. In our experiment, we measure the spectral profile of 
the cavity photoluminescence at a particular spatial position and 
extract both peak energies and widths of spectral resonances (see 
Methods). Crossings as well as anti-crossings of real energy levels 
are clearly seen both in experiments (Fig. 1c) and numerical simula- 
tions (Fig. 1d). 

To observe the transition between crossing and anti-crossing for the 
same near-degenerate pair of eigenvalues, a second control parameter 
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greyscale inside the billiard) is imaged via the photoluminescence. The billiard 
dimensions are fixed as W = 141m, L = 23 um, the radius of the defect R is 
varied from 0 to W, and the thickness of the walls dis varied from 3 lm to 7 um 
(see Methods). c, d, Experimentally measured (c) and numerically simulated 
(d) spectra E(R/W) for the first 11 modes of the billiard in arbitrary units (a.u.). 
With growing R, numerous degeneracies and quasi-degeneracies proliferate in 
the grey area, which is a signature of the transition to quantum chaos in the 
Hermitian Sinai billiard’. Topological properties of two near-degenerate modes 
(red and blue in the orange rectangles) are analysed in detail in Figs 2-4. 


needs to be varied. In our exciton-polariton billiard, this additional 
parameter is the thickness, d, of the billiard walls. Provided the internal 
area of the billiard remains unchanged, this parameter does not affect 
the geometry of the billiard and primarily controls the imaginary part 
V" of the non-Hermitian potential barrier. Figure 2 shows one pair 
of billiard modes highlighted in Fig. 1c in the vicinity of a near- 
degeneracy for two values of the control parameter d. One can clearly 
see the anti-crossing (crossing) behaviour of the real (imaginary) parts 
of the complex eigenenergies in the billiard with thick walls (Fig. 2a, c) 
and the opposite behaviour for the thin-wall billiard (Fig. 2b, d). 

Importantly, the energy-resolved real-space imaging of the photo- 
luminescence provides all the information about complex eigenvalues 
as well as the spatial structure of the eigenmodes (wavefunctions). In 
particular, the levels shown in Fig. 2 correspond (at R = 0) to the third 
mode with three horizontal lobes and the fourth mode with two ver- 
tical lobes. The experimentally imaged and calculated spatial profiles 
of these eigenmodes are shown as insets in Fig. 2a, b along the eigen- 
energy curves. We observe that the two modes are hybridized and 
therefore change their spatial profiles in the near-degeneracy region, 
and ‘exchange’ their spatial profiles after passing it. 

The behaviour of two billiard modes in the vicinity of a degeneracy 
can be described by a simple model of a two-level system with an 
effective coupling (see Methods). The corresponding non-Hermitian 
Hamiltonian reads”"***; 
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Figure 2 | Crossing and anti-crossing for two near-degenerate modes. 
These modes are boxed in Fig. 1c, d. a-d, Experimentally observed anti- 
crossing (a) and crossing (b) of eigenenergies of two modes in the spectrum of 
the exciton-polariton Sinai billiard with varying parameter R (see Fig. 1) for 
thick, d ~ 6 um (a, c), and thin, d= 4 um (b, d), billiard walls; dgp is the 
value corresponding to the exceptional point. Panels c and d show the 


Here Ey are the complex eigenvalues of two uncoupled modes (with 
E,,2 being the real energies and I, being the decay/gain rates), 
whereas q characterizes the coupling between these two modes (the 
star stands for complex conjugation). We will also use the mean 
complex energy E= (E E,) /2=E-—iI, and the complex energy 
difference 5E= (Ey E,)/2=8E—i8I. The eigenvalues of the 


Hamiltonian (equation (1)) are J, =E+ ,/5E2+ lql’s their real and 


Contour 


Figure 3 | Eigenvalues of a two-level non-Hermitian model in the vicinity 
of the exceptional point. a, b, Real (a) and imaginary (b) parts of the 
eigenvalues /,. of the model (equation (1)) as functions of two parameters, 
SE and 6. The exceptional point (EP) is shown in magenta. The crossing 
and anti-crossing of the real and imaginary parts of the eigenvalues as functions 
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corresponding crossing and anti-crossing of the linewidths (that is, imaginary 
parts of the complex eigenvalues). The error bars in a-d originate from 
numerical fitting of the spectroscopic data (see Methods). The upper (lower) 
inset panels in a and b illustrate the numerically calculated (experimentally 
imaged) spatial structure of the eigenmodes at different values of the 
parameter R. Details of the hybridization region are given in Methods. 


imaginary parts, which depend on the parameters SE =(5E, 5°), are 
shown in Fig. 3. These complex eigenvalues coalesce, 1; = /2, at the 
exceptional points (EPs)!*’, where iSEgp = +|q|. At these points, 
the eigenstates also coalesce and form a single chiral mode’*’***. 
Assuming that the coupling constant q is fixed, the exceptional 
points appear in the parameter plane as (5Egp, 6/ zp) = (0, +]q|). 
We assume 6/'> 0 in our range of parameters, so that there is only 
one exceptional point in the domain of interest. The exceptional point 


Contour 


of 6E, for 6’ < 8Igp and 5I' > 6/ gp, are shown in red and blue. This is 
in correspondence with the experimentally observed behaviour in Fig. 2. 
Traversing along the green contour encircling the exceptional point in the 
(SE, SI) plane reveals the non-trivial topology of eigenmodes, as shown 
in Fig. 4. 
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Figure 4 | Observation of the topological Berry phase acquired after circling 
around the exceptional point in the parameter plane. Transmutations of 
spatial distributions (black-and-white panels) of the selected eigenmode (from 
the pair shown in Fig. 2) along the closed contour in the parameter space 

(R, d) ~ (6E, 5F) encircling the exceptional point (see Fig. 3). Parameters 

are not varied in time during the measurements, and each distribution 
corresponds to the stationary mode at the corresponding parameter values. 


can be encircled in the (E, 5/”) plane by varying these two parameters, 
as seen in Fig. 3. 

Two parameters of the model, (5E, 6/), approximately correspond 
to the varying parameters (R, d) of our exciton-polariton billiard. The 
radius R mostly affects the real part of the potential, V(r), and hence 
the energy difference between the modes. Increasing R corresponds to 
a tighter spatial confinement and therefore to increasing SE. In turn, 
the thickness d of the billiard walls controls the gain/loss profile V’(r). 
Different modes have different spatial overlaps with the imaginary 
potential V"(r), and, therefore, are characterized by different integral 
(spatially averaged) dissipation parameters I (see Methods). In our 
case, increasing d corresponds to decreasing SJ. The effective coupling 
q in our model (equation (1)) is determined by the spatial overlap 
between the two modes away from the hybridization region’. The 
red and blue curves in Fig. 3 show the crossing/anti-crossing behaviour 
of the real and imaginary parts of the eigenvalues versus the energy 
difference SE for two values of the dissipation parameter: d/’< d/pp 
and 6/° > 6/gp. This behaviour is perfectly consistent with that in the 
experimental Fig. 2, which means that our range of varying parameters 
includes the exceptional point. 

The structure of the complex eigenvalues in the vicinity of the 
exceptional point reveals non-trivial topology of a branch-point 
type’*, shown in Fig. 3. Therefore, continuous encircling of the 
non-Hermitian degeneracy in the two-parameter plane (for example, 
along the green contour in Fig. 3) results in the transition to the other 
branch. When the contour is traversed twice, we return to the original 
mode, most significantly with a topological phase shift of m. This 
phase shift is the manifestation of the Berry phase resulting from 
encircling of a non-Hermitian degeneracy in a two-dimensional para- 
meter space***. We use the method suggested in the microwave 
experiment” to trace the above topological structure of two modes 
in the vicinity of the exceptional point. We compare the eigenmodes at 
neighbouring values of parameters (SE, 5/”) ~ (R, d) along the contour 
encircling the exceptional point (see Fig. 3). Notably, we do not con- 
sider adiabatic evolution of modes due to variations of the parameters 
(R, d) in time; such evolution would be accompanied by unavoidable 
non-adiabatic transitions in the non-Hermitian case?”"°. Rather, we 
examine the natural topological structure and geometrical connection 
of stationary modes depending on the parameter values. 
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a, b, The first loop (a) shows the transition to a different branch (mode) 
through the hybridization region (see explanations in text); the second loop 
(b) returns the mode to the original one with a 1 topological phase shift”***. The 
phases (colour panels) are inferred from comparison with the numerically 
calculated modes. The modes corresponding to the ‘start’ and ‘end’ points of the 
loop on the red (blue) branch in Figs 2a, b and 3a are boxed in red (blue). 


Figure 4 depicts the experimentally measured intensities and the 
corresponding numerically simulated phase profiles of the two modes 
from Fig. 2 for the parameter values lying on the contour encircling the 
exceptional point (Fig. 3). In Fig. 4a, we start on the upper branch (blue 
in Figs 2a and 3a) at R< Rxgp, d > dgp and trace the eigenmode trans- 
mutation as the radius is increased to R > Rgp. This takes us from the 
vertical two-lobe mode, through the anti-crossing, to the horizontal 
three-lobe mode (still on the blue upper branch). Then, we decrease 
the thickness to d< dgp and stay on the same horizontal three-lobe 
mode, which now corresponds to the red branch in Figs 2b and 3a. 
Next, reducing the radius R takes this mode through the crossing and 
recovers its three-lobe structure. Increasing d closes the loop. Thus, the 
continuous transformation brought us from the vertical two-lobe 
mode (‘start’ in Fig. 4a) to the horizontal three-lobe mode (‘end’ in 
Fig. 4a) at the same values of the parameters. Repeating this traverse 
one more time (Fig. 4b) returns us to the original vertical two-lobe 
mode, but now with the x topological phase shift (clearly seen in the 
simulated phase profiles). The experimental density distribution of the 
modes is in very good agreement with that calculated numerically. 
Therefore we can associate the phase structure of the simulated spatial 
modes with the experimental mode profiles”. 

Thus, we have demonstrated the creation of highly controllable 
complex (non-Hermitian) potentials for exciton-polaritons, and 
implemented a chaotic non-Hermitian exciton-polariton billiard with 
multiple spectral degeneracies. We have provided detailed experi- 
mental observations of the non-trivial behaviour of complex eigenva- 
lues and eigenmodes in the vicinity of an exceptional point. These 
include crossing/anti-crossing transitions as well as mode switching 
and topological Berry phase when encircling the exceptional point 
in the two-parameter plane. Our results show that the inherent 
non-Hermitian nature of exciton-polaritons determines their basic 
properties, which are crucial for transport and quantum information 
processing. Therefore, these features should be taken into account 
in future studies and applications involving confinement and 
manipulation of exciton-polaritons. Most importantly, this complex 
quantum dynamics can bring novel functionality to polariton-based 
devices operating at the interface between photonics and electronics. 
Generally, exciton-polaritons offer a novel macroscopic quantum plat- 
form for studies of non-Hermitian physics and quantum chaos at the 
confluence of light and matter. 
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METHODS 


Experimental setup. The semiconductor sample used in the experiment is a 
GaAs/AlGaAs microcavity containing 12 quantum wells (QWs) (~13 nm wide 
each) sandwiched between distributed Bragg reflector mirrors (32/36 mirror 
pairs). To achieve the strong interaction regime between cavity photons and 
quantum-well excitons'*', the quantum wells are distributed in the sample via 
three sets of four located at the anti-nodes of the photon mode. The cavity photon 
mode is red-detuned by 2.8 meV from the exciton resonance at 1.546 eV, resulting 
in the exciton-polariton dispersion schematically shown in Fig. 1a. The sample is 
mounted on a cold finger inside a continuous flow microscopy cryostat and 
maintained at 5.6 K. 

A schematic of the experimental apparatus is shown in Extended Data Fig. 1. 
The exciton-polariton condensate is formed by illuminating the sample by a quasi- 
continuous, off-resonant, linearly polarized pump beam derived from a continu- 
ous wave (CW) Ti:sapphire laser operating at 732 nm. The threshold power for the 
condensation is ~0.079 mW jum ~~. To minimize heating of the sample, the pump 
beam is chopped by an acoustic optical modulator (AOM). We use a digital 
micromirror device (DMD) to engineer the spatial pump profile in the shape of 
a Sinai billiard shown in Fig. 1b, which is then re-imaged onto the sample at 
normal incidence through a high numerical aperture (NA) microscope objective. 

Owing to the continuous decay of the exciton-polaritons, coherent photons 

escape the cavity as a photoluminescence signal and carry all the information 
about the condensate’. The photoluminescence is then collected via the micro- 
scope objective and analysed using the CCD camera and spectrometer (Extended 
Data Fig. 1). We reconstruct the spatial modes by scanning the real space imaging 
across the slit of the spectrometer. 
Creating exciton-polariton billiards. The DMD mirror is programmed to reflect 
the spatial pattern shown in Extended Data Fig. 2, thus creating a structured pump 
beam in the shape of a Sinai billiard**. The pump creates an inhomogeneous 
distribution of reservoir excitons in the plane of the quantum well, therefore 
inducing an effective potential for the condensed exciton-polaritons’®. 

The two parameters of the billiard controlling the non-Hermitian dynamics of 
exciton-polaritons are the radius of the round corner (defect), R, and the thickness 
of the walls, d. The latter is different on the different sides of the perimeter due to 
the shape of the laser beam illuminating the DMD. Throughout the main text, we 
consider a continuous change of R (0 < R/W < 1), but only two modifications of d 
(shown in Extended Data Fig. 2). 

We have verified that, for any R, when the thickness of the walls is varied 

within our experimental range, the pump power density remains approximately 
constant. For the ‘thin’ and ‘thick’ wall configurations shown in Extended 
Data Fig. 2, the values are 0.110+0.0033 mW jum ~* (Extended Data Fig. 2a) 
and 0.117+0.0047 mW jm ~* (Extended Data Fig. 2b), respectively. This effec- 
tively means that the height of the billiard potential walls, defined by the pump 
power, remains the same. Since the internal area and hence geometry of the billiard 
does not depend on d either, this leads us to conclude that the wall thickness 
controls mainly the imaginary part of the billiard potential. 
Spectroscopy of the billiard. Above the condensation threshold, exciton-polar- 
itons occupy multiple energy levels of the billiard potential, and in our experiments 
we comfortably resolve approximately the first 15 levels in the energy versus 
position spectrum. As the radius of the defect in the Sinai billiard grows, the area 
of the potential confining exciton-polaritons shrinks, so that the energy levels are 
blueshifted (see Fig. 1c, d). The spectral line profiles measured at fixed spatial 
positions in the vicinity of degeneracy highlighted in Fig. 1c, d are shown in 
Extended Data Fig. 3. The line profiles obtained for several values of the defect 
radius within the range (0.4 < R/W < 0.65) are plotted on the same plot and their 
relative blueshift is represented by the offset on the intensity axis. 

Positions of the individual energy levels (Fig. 2a, b) for different values of R are 

derived from the spectroscopic peaks, as schematically shown in Extended Data 
Fig. 3, and the linewidths (Fig. 2c, d) are determined by the numerical fitting of the 
spectral profile. The errors indicated in Fig. 2 arise from the numerical fitting 
procedure and therefore are very small. 
Modelling of the billiard. The full dynamics of the exciton-polariton condensate 
subject to off-resonant, incoherent optical pumping can be described by the gen- 
eralized complex Gross-Pitaevskii (or Ginzburg-Landau) equation’**’ for the 
condensate wavefunction, : 


ow(r, t) i 
Ot 2m 


ih V? + (giv |Wl? +(ge + ihR)ng(r) —ihy | w+ RW (r, 0) 


(2) 
Here m is the effective mass of the lower polariton, g is the polariton-polariton 
interaction strength, gp is the strength of interaction between the reservoir and 


condensed polaritons, R is the rate of stimulated scattering into the condensed 
state, and y is the spatially homogeneous decay rate of polaritons. The reservoir 
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density distribution nz(r) x P(r) is defined by the rate of reservoir (excitonic) 
polariton injection per unit area and time, P(r). The parameter 7,; entering equa- 
tion (2) characterizes gain saturation and, in general, depends on the spatial 
distribution of the pump. In our numerical calculations, we take the ,,, to be small 
and spatially homogeneous due to the weak overlap between the condensate and 
the pumping area. 
The model, equation (2), was initially suggested phenomenologically*? and sub- 
sequently derived from the semiclassical Maxwell-Bloch equations™. It qualita- 
tively coincides with the generalized open-dissipative Gross-Pitaevskii model’® 
augmented with the rate equation for the excitonic reservoir density: 

ONE Pr) — (yp ERIE) Ne 
in the regime of near-threshold pumping”. In this limit, the steady state reservoir 
density distribution can be expressed as Ne~ P(r) /’p —RP(r) ||? /yR=ne(r) — 
Ygh 1RO! Wl, where 7p is the decay rate of reservoir polaritons. 

The phenomenological energy relaxation*****°, which is essential to 
adequately model the multi-mode nature of the condensate*, is taken in the 
following form**”*: 

a 


R[W(r, t)] =n ae t)—ih 5 W(r, t) 


where « is the energy relaxation rate, and s(r, t) is a local chemical potential of 
the condensate. 

We use equation (2) to obtain the structure of the spatial modes of the exciton- 
polariton condensate corresponding to peaks of the energy spectrum. The 
parameters of the model used for our dynamical simulations are as follows: 
m=5xX 10 °me where m, is the free electron mass, g= 2 X 10°? meV yum?, 
&r= 2g, hR=6X 10 *meVum?, y=0.1 ps), yy = 0.3g, a=1.2X 10-3 um? 
ps ‘meV. The effective potential height is max(V’) = 2.25 meV, and the bil- 
liard wall profile given by the reservoir density distribution, n,(r), is convoluted 
with a Gaussian profile to account for the ‘soft’ edges of the potential created by the 
optical excitation and exciton diffusion. 

The spatial modes computed numerically using the fully nonlinear, open-dis- 
sipative dynamical model, equation (2), are presented in the bottom row of 
Extended Data Fig. 4. For comparison, the middle row of Extended Data Fig. 4 
shows the single-particle eigenstates of the complex linear potential induced by the 
excitonic reservoir: V(r) = V! +iV” =genr(r) +ih[Rnp(r) —y|, with both real 
and imaginary parts V’, V" proportional to the pumping rate*®, P(r). One can 
see that the condensate dynamics described by equation (2) effectively populates 
the eigenstates of the linear complex effective potential. The validity of our model 
is confirmed by the excellent agreement with the experimental images of the 
billiard modes presented in the top row of Extended Data Fig. 4. 

In agreement with previous studies”, the nonlinearity due to exciton-polariton 
interactions strongly determines the relative population of the eigenstates, as well 
as the overall blueshift of the eigenenergies. The eigenenergies are complex, and so 
the spectral linewidths may exceed the level separation. For this reason, in our 
experiment some of the higher-order energy-filtered wavefunctions represent 
superpositions of neighbouring eigenstates. For example, the seventh mode mea- 
sured in the experiment (last column in Extended Data Fig. 4) is, in fact, a super- 
position of eigenstates eight and nine, as revealed by the comparison with the 
numerically calculated modes. In contrast, the lower-order modes in Extended 
Data Fig. 4 represent almost pure eigenstates, having a very weak (less that 10%) 
admixture of the neighbouring eigenstates. 

Hybridization of modes. Hybridization of modes occurs in the vicinity of cross- 
ing and anti-crossing of the energy levels in Fig. 2a and b. In these regions, 
the billiard modes are different in shape to the uncoupled modes away from the 
(near-)degeneracy. In experiments, it is hard to spectrally resolve pure modes in 
the hybridization region since their spectral linewidths exceed the peak separation. 
Therefore, what is experimentally imaged and shown in the insets of Fig. 2a and b 
is a superposition of two modes. This is especially true for Fig. 2b, where the 
spectral peaks (but not the linewidths) precisely coincide at the crossing, so that 
in the experiment we can only image a single mode corresponding to a single peak. 

To match the spatial distributions obtained in the experiment with those found 
numerically, we plot superpositions of the pure eigenstates found in numerical 
simulations: y, =a; + Bo,e", where 3,4 are the pure eigenstates 3 and 4, « and f 
are their relative amplitudes, and 3 is the relative phase. We find that only the 
relative phase 2 = m/2 can produce a superposition that fits well with the experi- 
ment. These spatial modes (pure and superposition states) for the thick and thin 
billiard in the hybridization region (anti-crossing and crossing of eigenenergies, 
respectively) are shown in Extended Data Fig. 5. 

Note that this mode mixing is performed only in the hybridization region. Away 
from this region, the experimentally imaged and numerically calculated modes 
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match extremely well. Importantly, in Fig. 4, in order to perform a reliable phase 
extraction from numerically found modes away from the (near-)degeneracy, it is 
absolutely necessary to trace the continuous variation of phase of the pure modes 
as we pass the hybridization region. For this reason, we did not mix numerically 
found pure modes to match experimentally imaged spatial distributions of super- 
position states. This explains visible discrepancies between the spatial structure of 
numerically calculated and experimentally imaged modes in the hybridization 
regions in Fig. 4. 

Coupled-mode model. The behaviour of any two non-Hermitian modes of the 
billiard potential near the degeneracy point can be described by a standard 
coupled-mode model written in the dimensionless form as follows: 


OW, w(t, t) 
: Ot 


where Q characterizes the coupling strength between the states n and n’. 
Separating the temporal and spatial dependence of the wavefunctions, 
W,=4,(9,(r), substituting this ansatz into equation (3), and integrating out 
the spatial degrees of freedom, leads us to the eigenvalue equation (1) in the main 
text, where (n, n’) = (1,2). The real energies of the modes away from the degen- 
eracy in equation (1) are defined by the shape of the billiard potential, 
[—V?+V'(r)]9,,(r) =Eng,(1), the complex parts of the eigenenergies are given 
by the overlap between the billiard modes and the exciton reservoir, 


=[—V4V (nr) +iV"(r)] Vn, wv + Qn (3) 


I, J V"(r)|g,(r)|?d?r, and the off-diagonal matrix elements in equation (1) 
are determined by the degree of spatial overlap between the two modes, 
qx | o%(r)9,(r) dr. Here we assume that the uncoupled modes are properly 
normalized. 
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dimensional resonator in the plane of the quantum well. This approach is con- 
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Extended Data Figure 1 | Diagram of the experimental apparatus. See Methods for details. 
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Extended Data Figure 2 | Schematics of the optically induced billiard potential with two different wall thicknesses. a, Thin walls; b, thick walls. The active 
regions corresponding to the optical pump are shown in black, and we note that the enclosed area does not change with wall thickness. 
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Extended Data Figure 3 | Effect of wall thickness on spectroscopic line (b) walls. The thick lines demonstrate the principle of data extraction for anti- 
profiles of the Sinai billiard. a, b, Profiles are shown in the vicinity of the crossing (a) and crossing (b) of the energy levels corresponding to those 
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Extended Data Figure 4 | Spatial density distribution of the first seven Fig. 2b) with R/W = 0.35. Top row, experimentally imaged; middle row, 
simultaneously populated lowest-energy modes of the Sinai billiard. Spatial calculated using the effective linear potential model; bottom row, calculated 
density distributions were obtained from the thick-wall setup (Extended Data _ using the full dynamical model given by equation (2). 
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Extended Data Figure 5 | Spatial modes in the hybridization regions. 
a-g, Calculated spatial modes; each panel shows the modulus squared of the 
wavefunction (left) and the wavefunction’s phase distribution (right, colour 
coded). a, b, e, f, Numerically calculated pure spatial eigenstates (modes 3 
(a, e) and 4 (b, f)) for the Sinai billiard with thick and thin walls in the 
corresponding hybridization regions shown in Fig. 2a and b, respectively. 
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c, d, g, The superpositions of modes 3 and 4 that match the experimentally 
imaged modes shown in Fig. 2; c (boxed in blue) and d (boxed in red) 
correspond to the blue and red curves of Fig. 2a, respectively, while g (boxed in 
red and blue) corresponds to the crossing point in Fig. 2b. The relative 
populations of pure modes in the superposition states are: c, ||” = 0.85 and 
|B|° = 0.15; d, |a|” = 0.65 and ||” = 0.35; g, |x|? = 0.60 and |p|? = 0.40. 
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Sea-level rise can threaten the long-term sustainability of coastal 
communities and valuable ecosystems such as coral reefs, salt 
marshes and mangroves’”. Mangrove forests have the capacity to 
keep pace with sea-level rise and to avoid inundation through 
vertical accretion of sediments, which allows them to maintain 
wetland soil elevations suitable for plant growth’. The Indo- 
Pacific region holds most of the world’s mangrove forests*, but 
sediment delivery in this region is declining, owing to anthro- 
pogenic activities such as damming of rivers’. This decline is of 
particular concern because the Indo-Pacific region is expected to 
have variable, but high, rates of future sea-level rise’. Here we 
analyse recent trends in mangrove surface elevation changes across 
the Indo-Pacific region using data from a network of surface eleva- 
tion table instruments*”°. We find that sediment availability can 
enable mangrove forests to maintain rates of soil-surface elevation 
gain that match or exceed that of sea-level rise, but for 69 per cent of 
our study sites the current rate of sea-level rise exceeded the soil 
surface elevation gain. We also present a model based on our field 
data, which suggests that mangrove forests at sites with low tidal 
range and low sediment supply could be submerged as early as 2070. 

Intertidal mangrove forests occur on tropical and subtropical shor- 
elines, and provide a wide range of ecosystem services, including the 
support of fisheries, coastal protection and carbon sequestration, 
which are collectively and conservatively estimated to be worth 
US$194,000 per hectare per year (refs 11, 12). Although mangrove 
tree species are able to tolerate inundation by tides, they can die and 
their former habitat can convert to open water or tidal flats when sea- 
level rise (SLR) causes the frequency and duration of inundation to 
exceed species-specific physiological thresholds”, resulting in shore- 
line retreat'*, In low-sediment-supply systems such as Caribbean 
atolls, the capacity of the soil surface to keep pace with SLR is strongly 
dependent on the accumulation of organic matter derived from roots 
that decompose slowly in anaerobic soils'*. But sediment accretion on 
the soil surface in the Indo-Pacific region can also play a crucial role in 
surface elevation gains’®. 

Changes in the elevation of the soil surface over time can be mea- 
sured using the surface elevation table-marker horizon (SET-MH) 
methodology*’, which has been widely used and recommended for 
monitoring intertidal surface-elevation trajectories in coastal wet- 
lands’*. Here we use an extensive network of SET-MH stations 
(Fig. 1) with records of 1-16.6 years in length to investigate the role 
of sediments in maintaining surface elevation gain in these Indo- 
Pacific mangrove forests and to identify their vulnerability to future 
SLR. Recent trends in mangrove surface elevation change across 27 
sites in the Indo-Pacific (Supplementary Table 1) were analysed with 
respect to environmental factors, including suspended-matter concen- 
tration and the regional rate of SLR obtained from tide gauges. Future 


11,12 


vulnerability to SLR was modelled on the basis of the results of this 
analysis using a surface elevation change model and likely future 
SLR scenarios. 

Throughout the Indo-Pacific region, we found that mangrove soil- 
surface elevation gains are strongly dependent on rates of accretion of 
sediment on the soil surface (Fig. 2a) as well as subsurface organic 
matter accumulation, which has been observed in sites in the 
Caribbean’’. One site in southeast Java, Indonesia, has particularly 
high rates of surface accretion, owing to a mud-volcano eruption”, 
but even with this site removed from the analysis, surface elevation 
gain remains significantly correlated with sediment accretion 
(R? = 0.259, P< 0.001, F test). As expected from theoretical models'®, 
we found that the concentration of total suspended matter (TSM) in 
the water column, derived from remotely sensed MERIS (medium 
resolution imaging spectrometer) imagery, was proportional to surface 
accretion (Fig. 2b) and to surface elevation gains (Fig. 2c), although the 
relationship between surface elevation and TSM was more variable 
than that observed between surface elevation and locally measured 
rates of surface accretion. These relationships link the supply of sedi- 
ments to the maintenance of soil elevation relative to sea level in 
mangrove forests at regional scales within the Indo-Pacific region. 
Other factors (such as rate of SLR, geomorphology, habitat and dom- 
inant species) explained a smaller proportion of the variation in the 
surface elevation gains (Extended Data Table 1). On the basis of our 
network of SET-MH sites, we conclude that sediment supply is 
important to surface elevation gains and therefore to preventing man- 
grove-forest loss in the future. 

We found that 69% of surface elevation records in the Indo-Pacific 
data set (90 out of a total of 153 SET-MH stations) had rates of surface 
elevation gain that were less than the long-term rate of SLR for the 
region (Extended Data Fig. 1b). The remaining 31% of the records are 
from sites in Australia, New Zealand, Vietnam and Indonesia. Many of 
the sites that had rates of surface elevation gain less than SLR also 
exhibited shallow subsidence (Extended Data Fig. 1a). Shallow subsid- 
ence can be caused by a range of factors that increase compaction of the 
near-surface sediments and that are responsive to local environmental 
factors, including forest degradation’’. But whether subsidence and the 
‘elevation deficit’ relative to local rates of SLR indicate vulnerability 
of these mangrove forests to loss with increasing rates of SLR is 
unknown. If the topography allows the mangrove forest to migrate 
landward, with no anthropogenic barriers (such as infrastructure or 
flood-defence barriers), then mangroves may delay submergence by 
‘back-stepping’ into adjacent habitats*”. However, barriers to landward 
expansion of mangrove forests occur throughout the Indo-Pacific 
region, particularly in sites that have intensive aquaculture, urban 
development and low-lying agricultural land. We have therefore 
assumed that broad-scale landward retreat of human settlements in 


1School of Biological Sciences, The University of Queensland, Brisbane 4072, Australia. Global Change Institute, The University of Queensland, Brisbane 4072, Australia. ?Patuxent Wildlife Research 
Center, United States Geological Survey, Maryland 20708, USA. Department of Geography, National University of Singapore, 1 Arts Link, Singapore 117570, Singapore. °National Wetlands Research 
Center, United States Geological Survey, Louisiana 70506, USA. °Cambridge Coastal Research Unit, Department of Geography, University of Cambridge, Downing Place, Cambridge CB2 3EN, UK. School of 
Earth and Environmental Science, University of Wollongong, Wollongong 2522, Australia. ®The Institute for Marine Research and Observation, Ministry of Marine Affairs and Fisheries, Bali 82251, Indonesia. 
National Institute of Water and Atmospheric Research, Hamilton 3251, New Zealand. !°Department of Environmental Sciences, Macquarie University, Sydney 2109, Australia. ‘University of Science, 
Vietnam National University, Ho Chi Minh City, Vietnam. 12international Crane Foundation, Wisconsin 53913, USA. 


22 OCTOBER 2015 | VOL 526 | NATURE | 559 


©2015 Macmillan Publishers Limited. All rights reserved 


| LETTER 


Deep rod SET 
(3-20 m deep) 


i 


N 


Live root zone Vertical accretion 


Consolidated sediment or bedrock 


Figure 1 | Map of the Indo-Pacific region study sites and a schematic of the 
SET-MH. a, Study sites are indicated by stars; mangrove forests are shown 
in dark green. The colour of the coastal ocean represents variation in tidal range 
(Aviso + FES2012 tide model), where blue is microtidal (0-2 m), yellow is 


the region is unlikely as a result of political uncertainty and because in 
many nations coastal inhabitants are ‘trapped’ by a lack of capital and 
available inland sites that would support migration”’. 

To examine the future vulnerability of Indo-Pacific mangroves to 
SLR, we developed a model of mangrove habitat suitability based on 
position in the tidal frame. Mangrove forests persist in the portion of 
the tidal frame from mean sea level (MSL) to the level of the highest 
astronomical tide, which generally corresponds to the highest eleva- 
tion at which mangroves can survive”. This gives rise to what is termed 
a wetland’s ‘elevation capital’, or the potential of an intertidal wetland 
to remain within a suitable inundation regime at that site (that is, above 
MSL) despite subsiding relative to local SLR”. For example, mangrove 
forests occupying high intertidal sites that have a 10-m tidal range 
(such as the Kimberly coast of Australia) would need to lose up to 
5 m of elevation capital to reach MSL. In contrast, high intertidal sites 
with a tidal range of 1 m (such as the Caribbean and parts of Indonesia) 
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mesotidal (2-4 m) and red is macrotidal (>4 m). b, The SET—MH installation 
monitors changes in soil-surface elevation, surface accretion above a marker 
horizon and shallow subsidence (by difference*”); see Methods for details. 


would have to lose only 0.5m of elevation to put the entire contem- 
porary forest at or below MSL. Assuming that mangrove forest species 
cannot persist below approximately MSL, we estimate the time to 
inundation and thus loss of the forest by using tidal range as a surrog- 
ate for the elevation capital within the ecosystem. 

Over the range of elevation deficits within our data set, we estimated 
the time until complete submergence of the forest at sites with varying 
tidal range (and thus varying elevation capital). This model, which 
subtracts elevation from the elevation capital over time, assumes con- 
stant rates of SLR. Assuming an elevation deficit of 20mm yr’ (that 
is, sea level rising 20 mmyr ' faster than mangrove surface elevation 
gain), which occurs at some of our sites owing to high local rates of SLR 
and shallow subsidence (for example, Indonesia), we project complete 
submergence of the forests in 100 years wherever tidal ranges are less 
than 4 m (Extended Data Fig, 2). At an elevation deficit of 6mmyr * 
(the mean elevation deficit for our sites with elevation deficits), we 
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Figure 2 | The relationship between mangrove soil-surface elevation gains 
and sediment availability. a—c, Relationships between soil-surface elevation 
gains and accretion on the soil surface (a), accretion on the soil surface and 
average annual TSM derived from MERIS satellite imagery (b), and surface 
elevation gains and average annual TSM (c). Data points are coloured as 
follows: pink, Indonesia; dark green, Vietnam; light green, New Zealand; 
yellow, western Australia; dark blue, Micronesia; grey, Singapore; white, eastern 
Australia. Solid lines are linear regressions: a, (surface elevation 

gain) = (—4.44 + 0.95) + (0.78 + 0.03) X (surface accretion), R’ = 0.849, 
P<0.0001, F test (for overall significance of the linear regression); b, (surface 
accretion) = (4.15 + 1.08) + (1.57 + 0.15) X TSM, R? = 0.443, P<0.0001, 

F test (excluding data from Porong, Indonesia); c, (surface elevation 

gain) = (1.38 + 0.83) + (0.51 + 0.11) X TSM, R? = 0.122, P< 0.0001, F test 
(excluding data from Porong, Indonesia); the indicated uncertainties are 
standard errors. Source Data for this figure are available online. 


estimate it would take 100-300 years for high intertidal forests to be 
lower than MSL, while at low elevation deficits (1 mm yr ~ ) the forests 
may persist for thousands of years. The palaeorecord is consistent with 
high levels of persistence of mangroves through time when rates of SLR 
are low to moderate (that is, low levels of elevation deficits). For 
example, in Belize there is evidence that mangrove forests persisted 
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Figure 3 | Year in which mangroves are predicted to be submerged at sites 
with low (<2.5 gm‘ *) sediment availability over variation in tidal range 
and rates of SLR. The darkest blue region indicates no submergence predicted 
within the modelling time frame (until 2100). At high sediment supply 
(>2.5gm *), mangrove forests were not predicted to be submerged by 2100. 
See Methods for further details. 


for long intervals over the Holocene epoch during periods when rates 
of SLR were less than 5 mmyr’ | (ref. 15). Additionally, there is evid- 
ence that high intertidal mangrove forests in northern Australia per- 
sisted for thousands of years despite relatively high rates of SLR". 
However, evidence of overwhelming flooding and loss of mangrove 
forests is also evident during past rapid rises in sea level’*. 

To synthesize the effects of sediment supply and accelerating rates of 
SLR (and corresponding elevation deficits) on the fate of mangrove 
forests, we formulated a second model that assessed the probable time 
to submergence of mangrove forests over the range of observed rates of 
surface elevation gains with no landward migration and over a range of 
tidal amplitudes. According to the model, mangrove forests are likely 
to persist at sites with high tidal range even with high rates of SLR and 
low levels of sediment availability (Fig. 3), consistent with palaeo- 
observations* and theory’*"*. However, at sites with low tidal range, 
forests will be vulnerable by 2080 at moderate SLR (0.8 m by 2100). 

We cannot estimate the absolute extent of losses of mangrove cover 
over the region because measurements of mangrove forest elevation in 
the region are too coarse; however, our model provides a semi-empir- 
ical indication of the conditions under which mangrove loss is likely 
with SLR and locations where management of sediment supply and 
space for landward migration are vital to ensure mangrove forests 
survive into the future. Our model indicates that the outlook for man- 
grove forests in some locations is poor under relatively low rates of 
SLR—the Intergovernmental Panel on Climate Change (IPCC) 
Representative Concentration Pathway 6 (RCP6) scenario—with sub- 
mergence of mangroves by 2070 predicted in the Gulf of Thailand, the 
southeast coast of Sumatra, the north coasts of Java and Papua New 
Guinea and the Solomon Islands (Fig. 4). In contrast, the outlook for the 
persistence of mangroves into the future is more positive in east Africa, 
the Bay of Bengal, eastern Borneo and northwestern Australia, where 
there are relatively large tidal ranges and/or higher sediment supply. 

Our model does not account for long-term and nonlinear feedbacks 
within the system where elevation deficits may be enhanced or reduced, 
for example, through episodic high-wave-energy events that cause ero- 
sion”, degradation of forests”, other stochastic events such as intense 
storms that may alter hydrology or deliver sediment pulses”, or 
changes in ocean circulation that may influence regional rates of 
SLR’. The frequency and intensity of these events are predicted to 
increase under climate-change scenarios’, and all of these factors will 
influence the length of time before forest submergence and loss. Our 
model also does not include subsidence (or uplift) that occurs below the 
SET benchmark’, which in some locations may strongly influence the 
time until submergence. But shifts in the way sediment is managed, and 
reversing forest degradation and thus enhancing organic matter inputs 
to sediments may extend the persistence of mangroves for hundreds of 
years (for example, reducing elevation deficits by 6 mm yr ' extended 
the time until submergence from 83 years to 167 years for sites with a 
2-m tidal range). In coastal and estuarine systems with reduced 
upstream sediment inputs due to human modifications”, the potential 
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Figure 4 | Mangrove forest distribution in the Indo-Pacific region. a, Dark 
green areas indicate current mangrove forests. b-d, Predicted decade of 
mangrove forest submergence, indicated by the colour scale in c, for IPCC 
RCP6 (0.48 m SLR by 2100) (b), RCP8.5 (0.63 m SLR by 2100) (c), and a more 


for eco-geomorphic feedbacks that delay the onset of mangrove-forest 
loss is diminished. 

Data from our network of sites indicate that the fate of mangroves in 
the Indo-Pacific with SLR is strongly linked to the availability of sus- 
pended matter, which is important for increases in soil-surface eleva- 
tion and enables mangroves to maintain their elevation within the tidal 
frame above MSL. The importance of sediment supply for the resili- 
ence of mangrove forests in the face of SLR has been inferred from the 
palaeontological record’* and from recent observed changes in man- 
grove coasts where sediment supply has been reduced, owing to dam- 
ming of rivers’. In Thailand, there has been an 80% reduction of 
sediment supply in the Chao Phraya River delta, which, in combina- 
tion with surface subsidence caused by groundwater extraction, has 
resulted in kilometres of mangrove shoreline retreat*®. Within the 
Mekong River system, planned construction of dams and reductions 
in sediment supply”® will have a devastating effect on the local coastal 
sediment budget and the long-term persistence of mangrove forests. 
Management of the coast and particularly of the river systems that 
deliver much of the sediment to the region is therefore vital for the 
survival of mangrove forests. Although sediment supply at some sites 
may be maintained as a legacy of prior forest clearing of catchments 
(which leads to erosion of soil), the restriction of sediment supply 
caused by the building of dams is a major issue that will contribute 
to mangrove losses in the future. 

More than half the mangrove forests we studied have already lost 
elevation relative to sea level. With constant rates of SLR where tidal 
ranges are large and sediment supplies are maintained, mangrove 
forests in the upper intertidal zone may survive thousands of years 
before they are threatened by submergence. However, under moderate 
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extreme scenario (1.4 m SLR by 2100) (d). The darkest blue region indicates no 
submergence predicted within the modelling time frame (until 2100). The 
model assumes landward migration of mangrove forests is not possible. 


emissions scenarios (for example, IPCC RCP6) at sites with low tidal 
ranges and low sediment supply, mangrove forests may be lost by 2080. 
Our work emphasizes the urgent need to plan for the maintenance of 
sediment supply in river systems that are expected to be heavily modi- 
fied and dammed in the future, to reverse forest degradation that 
reduces organic matter inputs and to plan for the landward migration 
of mangrove forests to higher elevations in locations where sediment 
supply is expected to be restricted. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

SET-MH method description. The SET and the later-developed rod-SET consist 
of a benchmark rod driven in sections through the soil profile to resistance, often 
to 10-25-m depth in the soil or to when bedrock is reached (Fig. 1b). After 
installation of the benchmark rod, a portable horizontal arm is attached, and fixed 
points (usually four positions around the top of the rod) are used to measure the 
distance to the substrate surface using a series of vertical pins lowered to the soil 
surface (Fig. 1b). Total surface height measurements have confidence intervals of 
+1.3 mm (ref. 8). SET data are usually complemented with monitoring of accre- 
tion on the soil surface using artificial soil marker horizons typically made of 
feldspar, sand or other resistant material, which simultaneously allows users to 
quantify rates of vertical surface accretion (that is, sediment deposition; Fig. 1b). 
The complete SET-MH installation provides observations of net surface elevation 
change above the benchmark depth as well as accretion on the surface of the 
wetland. These values may be compared to infer whether surface or subsurface 
processes are contributing to surface elevation gains. For example, if accretion on 
the soil surface is equivalent to surface elevation gain, then accretion on the soil 
surface, whether of mineral or organic origin, is the major process contributing to 
elevation gain. However, if elevation gains are less than surface accretion, then 
shallow subsidence of the soil volume is inferred (Extended Data Fig. 1). 
Conversely, if elevation gains are greater than surface accretion, then expansion 
of the subsurface soil profile is inferred, which may be due to root growth*?*”. 
Over many sites it has been repeatedly shown that vertical accretion on the soil 
surface is not a valid substitute for surface elevation change and that the complete 
set-up is necessary to identify the contribution of surface and shallow subsurface 
processes to surface elevation change at a specific site’***?°. Repeated measure- 
ments allow description of net surface elevation change, which can be integrated 
with region-specific relative SLR (for example, tide-gauge data; see Supplementary 
Information) to determine whether the surface elevation of mangroves has kept 
pace with SLR over that time period. 

Analysis of variation in surface elevation. Linear regression was used to describe 
the relationships between: (1) surface elevation gains and accretion of sediment on 
the soil surface; (2) accretion on the surface and TSM; and (3) surface elevation 
gains and TSM. Forms of these relationships are given in the legend of Fig. 2. 

The relative influence (in per cent) of predictor variables on surface elevation 

change (in millimetres per year) was analysed using boosted regression tree (BRT) 
models” developed using data from 153 observations and 10 predictors, with a 
tree complexity of 5 and learning rate of 0.005. We developed three models 
using three different measures of sea-level variation at each site (Supplementary 
Table 1): the long-term rate of SLR at tide gauges (model 1); the rates of sea-level 
change over the period of the surface elevation gain measurements at tide gauges 
(model 2); and the rates of sea-level change based on satellite altimetry (model 3). 
On the basis of cross-validation, the mean percentages (s.e.m.) of deviance 
explained by models 1-3 are 44.8% (+10.5%), 38.2% (+9.3%) and 40.9% 
(+8.9%), respectively. BRT modelling was done with R version 3.0.2 using 
packages dismo and gbm. The BRTs were built with a 10-fold cross-validation 
optimization, with a Gaussian distribution for surface elevation change. 
Stochasticity (bag fraction) was set to 0.5. The final models were fitted with 
5,850 trees. Geomorphological setting followed the classifications of ref. 30: river 
delta, tidal, lagoon or carbonate island. Ecological habitat followed the classifica- 
tions of ref. 31: fringe, scrub, hammock, basin, overwash or riverine. Dominant 
tree genera were Avicennia, Rhizophora, Sonneratia; mangrove forests were clas- 
sified as mixed forests at sites where no single genera was dominant. 
Estimating time to submergence. Years to submergence over variation in tidal 
range was estimated by summing annual elevation deficits (Extended Data Fig. 2). 
Elevation deficit is the difference between the rate of local SLR and the rates of 
surface elevation gain. Where elevation deficits in our data were observed 
(N = 103), mean elevation deficit over our sites was 6mm yr! (dashed line in 
Extended Data Fig. 2). Extended Data Fig. 2 shows the years to submergence (ona 
logarithmic scale) of the highest intertidal mangrove forest over variation in tidal 
range (microtidal, blue; mesotidal, yellow; macrotidal, red), for a range of elevation 
deficits (1-20 mm yr); see Extended Data Fig. 2. 
Model to predict the year of submergence of mangrove ecosystems. A model to 
predict the year of submergence of mangrove ecosystems subject to accelerating 
rates of SLR was developed for various physical environmental contexts. The 
model was based on the observed rates of mangrove surface elevation change as 
a function of rate of SLR, suspended sediment availability and tidal range. The 
model was run from 2010 to 2100 in 10-year time steps (see Extended Data Fig. 3 
for a summary of the modelling process). 

The vertical range of mangrove distribution was assumed to be the upper 50% of 
the tidal range”. For example, if the tidal range was 1 m, the vertical distribution of 
mangroves was assumed to be 0.5 m. Assuming that mangroves were at their upper 
vertical limit of their range at the start of the simulations, the time until net 


elevation loss was equivalent to 50% of the magnitude of tidal range (in metres) 
was calculated as the time to mangrove submergence. In each time step, elevation 
deficit was calculated as the magnitude of SLR minus the magnitude of surface 
elevation gain. Total elevation deficit over the 90-year simulation was calculated by 
summing the accumulated elevation deficits. 

Elevation gain (in millimetres per year) caused by sediment accumulation for 
particular SLR and suspended-sediment scenarios was calculated according to the 
observed surface elevation data. The slope and intercept of linear models relating 
elevation gain to rate of SLR are given in Extended Data Table 2. The relationship 
between surface elevation gain and rate of SLR (in millimetres per year) was 
established for two TSM classes: low (<2.5 gm’) and high (>2.5 gm”). 
Linear regression was used to establish the functional forms of the relationships 
between surface elevation gain and the rate of SLR. 

Scenarios of tidal ranges from 0.5 m to 2.0 m in 0.5-m increments were examined 
(4 total). Six SLR trajectories were simulated for a total of 24 simulations. The starting 
rate of SLR for each trajectory was 3 mm yr‘, equivalent to the current rate of global 
average SLR. The rate of change of sea-level increase was varied by 0.5mm yr in 
decadal time steps for the six trajectories: SLR increased by 0.5 mm yr‘ each decade 
in the first trajectory, 1 mm yr ' in the second, 1.5 mmyr "in the third and so on, up 
to 3.0mmyr ' each decade in the last trajectory. The resultant magnitudes of sea- 
level change for the six trajectories were 0.45 m, 0.63 m, 0.81 m, 0.99 m, 1.17 m and 
1.35m by 2100. The model was run for each of the tide-range (N = 4) and SLR 
(N = 6) trajectory combinations, for each of the sediment availability scenarios (low 
and high), for a total of 48 simulations. 

We then created spatial layers of the model. The TSM layer was classified as high 
(>2.5 gm”) or low (<2.5 gm”). The tidal-range layer was sourced from the 
FES2012 tidal model package distributed by AVISO, with support from CNES 
(http://www.aviso.altimetry.fr/)*?. We ran the model for each pixel that contains 
mangroves, as indicated by the data presented in ref. 4, for three SLR scenarios 
(RCP6.5, RCP8.5 and a higher, 1.4-m SLR by 2100 scenario based on ref. 33). 

There are a number of assumptions and limitations to this approach. First, we 

assumed that mangroves commenced the simulations at the upper vertical limit of 
their range. Therefore, mangroves at lower vertical extents would submerge earlier 
and the model is an optimistic estimate of time until submergence. Second, while 
feedbacks between surface elevation change and other environmental features 
(sediment supply, vertical location in the tidal frame and so on) were not explicitly 
incorporated, they were implicitly included as they would have contributed to the 
observed SET data upon which the model was built. Third, we assumed that 
mangroves would be submerged when they reached MSL (50% of the tidal range). 
However, mangroves may be able to persist beyond this time (that is, there may be 
a time lag), owing to physiological tolerance and acclimatization. If this time lag 
were to exist, then it would extend the time frame for which mangroves would be 
expected to survive after submergence. Lastly, the model does not consider the area 
of habitat, or predict when new habitat would become available. 
Time scales of soil surface elevation records. The timescale of SET measure- 
ments is relatively short compared to the timescales of ecosystem change in res- 
ponse to SLR; therefore, to assess whether SET measurements are representative of 
longer term rates of surface elevation change we took two approaches. The first 
uses SET records of differing lengths to compare shorter- and longer-term rates. 
The second compares SET elevation gains with those inferred from *’°Pb dating of 
sediment cores (over the scale of decades) for the few sites where sediment dating 
and SET data are available. 

To assess whether the length of the SET record is likely to influence our results, 
surface elevation gains measured over longer periods (mean record length of 
5.5 years) were compared to those over shorter periods (mean record length of 
2.1 years) for three sites (New Zealand, N = 3; Micronesia, N = 13; Moreton Bay, 
Australia, N = 18). Longer-term and shorter-term rates were highly correlated 
(R? = 0.59) with a slope of 0.90 + 0.13 which was not statistically different from 1 
(t = 0.769, P = 0.45) (Extended Data Fig. 4). The lengths of the SET records were 
not correlated with surface elevation gain, surface accretion, shallow subsidence or 
elevation deficits relative to SLR. Six SETs in Micronesia have now been monitored 
for 16.6 years. At this site, surface elevation gains at 16.6 years were correlated with 
surface elevation gains at 6.6 years: (surface elevation at 16.6 yr) = —0.16 + (0.36 + 
0.12) X (surface elevation at 6.6 yr), R* = 0.59. Thus, in Micronesia, the long-term 
elevation gain was approximately 40% of the short-term rate, indicating compaction 
of the sediment profile over time. 7'°Pb dating of sediment cores and SET data are 
available for the New Zealand site and also for locations on the east coast of 
Australia. In New Zealand, sediment accumulation rates measured using SETs 
and those using ?10Pb (from the 1960s to the present) are similar. In Moreton 
Bay, the mean rate of mangrove sediment accumulation using *’°Pb was 
1.2+0.9mm ioe which is lower, but in the range of that observed using SETs 
in similar habitats (1.7 + 0.5mm yr 1. ref. 35); in southeastern Australia, 7!°Pb was 
1.7+0.3mmyr_ ‘, which is higher than that observed using SETs (0.72 + 0.49 mm 
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yr 1 ref. 35). Additionally, rates of surface elevation gain measured with SETs in the 
Caribbean and Florida are broadly consistent with sediment accumulation rates 
derived from '*C dating'’ and 7‘°Pb dating**. The study in Florida®* found sediment 
accumulation based on *!°Pb was on average 81% of that measured using 2.5-yr SET 
records. Compaction can be caused by loss of pore space due to dewatering and grain 
packing, and compression and decomposition of organic matter, which may not 
occur linearly over time”. Variation in sediment characteristics are likely to lead to 
variable rates of compaction over the Indo-Pacific region. If high rates of compac- 
tion are typical, then our short-term rates may over-estimate surface elevation gains 
for the region. 

Total suspended matter in coastal waters. In this study we used level-3-pro- 
cessed TSM data at 4-km resolution and binned monthly (data freely available 
from http://hermes.acri.fr/). TSM concentration was derived from the MERIS 
instrument on the European Space Agency’s (ESA) Envisat satellite (390- 
1,040 nm). TSM in coastal waters is an indicator of suspended sediments assoc- 
iated with river run-off and resuspension and is useful in both estuarine and reef 
lagoon waters**. Data products were processed and validated as part of the ESA’s 
DUE GlobColour Global Ocean Colour for Carbon Cycle Research project (for 
more information on data processing see http://www.globcolour.info/CDR_Docs/ 
GlobCOLOUR_PUG. pdf). The resulting raster grid was displayed in the plate- 
carrée projection. TSM was extracted using the open-source software BEAM 
VISAT (http://www.brockmann-consult.de/cms/web/beam/; ESA), using the 
TSM value of the pixel containing, or closest to, the SET site. For 24 sites, we used 
data from the pixel containing the SET site (that is, within 4 km of the site); 3 sites 
were 1 pixel distant and 3 sites were greater than 1 pixel distant (2 pixels for 
Kooragang Island, 5 pixels for Quail and 17 pixels for Porong). The Porong region 
has limited data availability owing to high cloud cover. An annual mean for 2011, 
where data from all sites was available, was calculated by averaging TSM pixel data 
for January, April, July and October. In other years, TSM from many sites were 
missing from the data set. We used mean annual data in 2011 to assess relation- 
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ships between sediment accretion and TSM. Comparison of TSM values over the 
different years the data were available (since 2002) found that spatial differences 
were consistent over years. The relationship between mean TSM in 2011 and mean 
over the available record is shown in Extended Data Fig. 5. The linear regression of 
this relationship is (mean TSM in 2011) = (1.38 + 0.94) + (0.58 + 0.08) X (mean 
TSM over all available years), R? = 0.64, P< 0.0001, F test, where the indicated 
uncertainties are standard errors. 
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Extended Data Figure 1 | Frequency distributions of values of shallow 
subsidence and elevation deficits. a, The frequency distribution of shallow 
subsidence over all the SET sites, calculated as (surface accretion) — (surface 
elevation gain). (The data presented here are available online from the Source 
Data of Fig. 2). b, The frequency distribution of surface elevation deficits 
relative to SLR from tide gauges (see Supplementary Table 1). 
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Extended Data Figure 2 | Years until submergence (logarithmic scale) ofthe _ capital (defined as half the tidal range). The mean elevation deficit in our study 
highest intertidal mangrove forest over variation in tidal range and for a was 6mm yr | (dashed line); other elevation deficits shown are 12mm yr | 
range of elevation deficits. The elevation deficit is the difference between the (mean + SD = 6 + 6.3; long-dashed line), 1 mmyr | (minimum; dotted line) 
rate of local SLR and the rate of surface elevation gain. Submergence isassumed and 20mm yr | (maximum; solid line). Categories of tidal range are coloured 
to occur when the cumulative elevation deficit is equivalent to the elevation blue for microtidal, yellow for mesotidal and red for macrotidal. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Total Suspended Matter 
(High or Low) [constant] 


Rate of sea level rise (mm/decade) 
[Accelerates each decade] 


Time step = 10 years 
Model duration = 2010— 2100 


Based on empirically 
derived relationship 
between TSM and SLR 
See Extended Data Table 2 


Surface elevation gain 
(mm/decade) 


Tidal range at 


Level of the sea (m) the site (m) 


Repeat for different tidal 


ranges 
Elevation deficit % Tidal range (m) 


each decade (proxy for “elevation 


(by subtraction) capital”) Repeat for different 


accelerating SLR trajectories 
(0.45 — 1.35 m by 2100) 


% Tidal elevation - Elevation deficit 


% Tidal elevation % Tidal elevation > 


Elevation deficit Elevation deficit 
Submerged Not submerged 


RECORD YEAR OF SUBMERGENCE 
Repeat for next decade. 


Extended Data Figure 3 | Schematic summary of the modelling process for estimating the decade of submergence of mangrove forests with SLR. 
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Extended Data Figure 4 | Comparison of surface elevation gains measured _ length is 2.1 years. Longer-term and shorter-term rates were highly correlated 
over longer and shorter periods for three sites. The three sites are New (R = 0.59) with a slope of 0.90 + 0.13, which is not statistically different from 1 
Zealand (N = 3), Micronesia (N = 13) and Moreton Bay, Australia (N = 18). (t = 0.769, P = 0.45). 

The long-term mean record length is 5.5 years; the short-term mean record 
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Extended Data Figure 5 | The relationship between mean TSM in 2011 over (mean TSM in 2011) = (1.38 + 0.94) + (0.58 + 0.08) X (mean TSM over all 
the available TSM record (2002-2011). In 2011, all sites were represented in —_ available years), R* = 0.64, P< 0.0001, F test, where the indicated uncertainties 
the MERIS data set. The linear regression (solid line) of this relationship is are standard errors. Dashed lines are 95% confidence intervals. 
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Extended Data Table 1 | Summary of the relative influence of predictor variables on surface elevation change for BRT models 


Model predictor 

Model 1 

Total Suspended Matter (annual mean) g m® 
Sea level change at tide gauge (mm/year) 
Longitude 

Geomorphological setting 

Ecological habitat 

Latitude 

Dominant tree genera 

Annual rainfall (mm) 


Tidal range (m 


Relative Influence (% 


36.47 
29.32 
8.81 
7.19 
5.36 
5.20 
3.68 
2.84 
1.12 


Model 2 

Total Suspended Matter (annual mean) g m® 

Sea level change during the SET measurement (mm/year) 
Longitude 

Geomorphological setting 

Ecological habitat 

Latitude 

Dominant tree genera 

Annual rainfall (mm) 


Tidal range (m) 


37.40 
30.00 
8.56 
6.99 
5.82 
4.00 
4.08 
2.12 
1.06 


Model 3 

Total Suspended Matter (annual mean) g m° 

Sea level change from satellite altimetry (mm/year) 
Longitude 

Geomorphological setting 

Ecological habitat 

Latitude 

Dominant tree genera 

Annual rainfall (mm) 


Tidal range (m) 


53.38 
3.02 
14.22 
6.77 
5.08 
4.97 
5.72 
3.17 
3.66 


See Methods for descriptions of the different models and predictors. 
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Extended Data Table 2 | Parameters used in the model for estimating time to submergence of mangrove forests for different sediment 
availability (classes of TSM). 


Sediment availability Slope Standard error of Intercept Standard error R 
classes estimate of estimate 

Low (<2.5gm y) 0.36 0.04 0.29 0.53 0.45 
High (>2.5 g m 5.84 0.28 -24.6 2.49 0.85 


The model is described in Extended Data Fig. 3. TSM values were obtained from MERIS data; see Methods. The parameters describe the linear regression relating surface elevation gain (in millimetres per year) to 
SLR (in millimetres per year) for two sediment availability bins, averaged over all tidal ranges; see Fig. 3. 
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Kidney organoids from human iPS cells contain 
multiple lineages and model human nephrogenesis 


Minoru Takasato?, Pei X. Er’, Han S. Chiu’, Barbara Maier’, Gregory J. Baillie?, Charles Ferguson”, Robert G. Parton, 


Ernst J. Wolvetang®, Matthias S. Roost*, Susana M. Chuva de Sousa Lopes* & Melissa H. Little 


The human kidney contains up to 2 million epithelial nephrons 
responsible for blood filtration. Regenerating the kidney requires 
the induction of the more than 20 distinct cell types required for 
excretion and the regulation of pH, and electrolyte and fluid bal- 
ance. We have previously described the simultaneous induction of 
progenitors for both collecting duct and nephrons via the directed 
differentiation of human pluripotent stem cells’. Paradoxically, 
although both are of intermediate mesoderm in origin, collecting 
duct and nephrons have distinct temporospatial origins. Here we 
identify the developmental mechanism regulating the preferential 
induction of collecting duct versus kidney mesenchyme progeni- 
tors. Using this knowledge, we have generated kidney organoids 
that contain nephrons associated with a collecting duct network 
surrounded by renal interstitium and endothelial cells. Within 
these organoids, individual nephrons segment into distal and 
proximal tubules, early loops of Henle, and glomeruli containing 
podocytes elaborating foot processes and undergoing vasculariza- 
tion. When transcription profiles of kidney organoids were com- 
pared to human fetal tissues, they showed highest congruence with 
first trimester human kidney. Furthermore, the proximal tubules 
endocytose dextran and differentially apoptose in response to cis- 
platin, a nephrotoxicant. Such kidney organoids represent power- 
ful models of the human organ for future applications, including 
nephrotoxicity screening, disease modelling and as a source of cells 
for therapy. 

The mammalian kidney is derived from intermediate mesoderm. 
Cells from the primitive streak (presomitic mesoderm; PSM) migrate 
rostrally to form the intermediate mesoderm”. The intermediate meso- 
derm gives rise to both key kidney progenitor populations, the ureteric 
epithelium and the metanephric mesenchyme, which form the collect- 
ing ducts and nephrons, respectively. Several studies have reported the 
successful differentiation of human pluripotent stem cells (hPSCs) into 
either ureteric epithelium or metanephric mesenchyme in vitro*’. In 
contrast, we previously reported the simultaneous generation of both 
ureteric epithelium and metanephric mesenchyme from hPSCs, 
resulting in the induction of nephrons and collecting ducts’. This 
was paradoxical as it was assumed that the ureteric epithelium arises 
as a side branch of the mesonephric duct, itself forming from the 
anterior intermediate mesoderm, while the metanephric mesenchyme 
is derived from the posterior intermediate mesoderm**. Retinoic acid 
(RA) regulates anterior—posterior patterning in organogenesis with 
rostral RA signalling patterning the somites’ (Fig. le). Conversely, 
the PSM expresses Cyp26, which attenuates RA signalling in the caudal 
embryo'”"’. The PSM is also a strong site of Wnt signalling’’. In our 
previous studies, we demonstrated in vitro that formation of the inter- 
mediate mesoderm required FGF9 or FGF2 (ref. 1). Hence, in vivo we 
assume that the ureteric epithelium forms from early migrating PSM 
cells exposed to FGF9 and RA soon after the primitive streak stage, 
while cells late to migrate, and hence exposed to longer Wnt signalling, 
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should give rise to the metanephric mesenchyme”’ (Fig. 1a). To con- 
firm this, we varied the duration of initial Wnt signalling (using 
CHIR99021, an inhibitor of GSK-3) before addition of FGF9 
(Fig. 1b) and monitored markers of the anterior intermediate meso- 
derm and posterior intermediate mesoderm by quantitative PCR. 
Shorter periods of CHIR99021 application induced the anterior inter- 
mediate mesoderm markers, LHX1 and GATA3, whereas longer 
periods increased the posterior intermediate mesoderm markers, 
HOXD11 and EYAI, at day 7. Prolonged expression of the PSM mar- 
kers, TBX6 and T, after a longer period in the presence of CHIR99021 
suggested a delay in FGF9-induced fate commitment (Fig. 1c), as pre- 
dicted. Immunofluorescence analysis showed that a longer (or shorter) 
period with CHIR99021 induced less (more) anterior intermediate 
mesoderm but more (less) posterior intermediate mesoderm, as indi- 
cated by GATA3 and HOXD11, respectively, at day 7 of differentiation 
(Fig. 1d). These observations persisted after 18 days of culture, with 
dominant ureteric epithelium induction (GATA3*PAX2*ECAD*) 
after fewer days in the presence of CHIR99021 and preferential induc- 
tion of metanephric mesenchyme (PAX2*ECAD_ ) and its derivatives 
(PAX2°ECAD*) with more days in the presence of CHIR99021 
(Extended Data Fig. 1a). Further, we investigated whether RA signal- 
ling also controls anterior—posterior fate patterning of the intermedi- 
ate mesoderm using RA or an RA receptor antagonist, AGN193109, 
together with FGF9 (Fig. le, f). RA promoted ureteric epithelium 
induction, whereas AGN193109 inhibited ureteric epithelium but 
enhanced induction of the metanephric mesenchyme lineage 
(Fig. 1g and Extended Data Fig. 1b). 

These results increase our understanding of embryogenesis as well 
as providing a method by which to modulate the relative induction of 
each of the two intermediate mesoderm-derived progenitor popula- 
tions essential for kidney formation. As a result, we modified our 
existing kidney differentiation process to increase the proportion of 
metanephric mesenchyme formed, increase the time in 3D culture and 
actively trigger nephron formation. This optimized approach was 
applied to either human embryonic stem (ES) cells or human induced 
pluripotent stem (iPS) cells and involved an initial 4days of 
CHIR99021, which resulted in the induction of both the ureteric epi- 
thelium and the metanephric mesenchyme in monolayer culture 
(Extended Data Fig. 2), followed by 3 days of FGF9 before transfer 
to organoid culture (Fig. 2a). The resulting aggregates were cultured 
for up to 20 days, during which time they spontaneously formed com- 
plex kidney organoids (Fig. 2b). During normal mouse kidney 
development, nephron formation from the metanephric mesenchyme 
is initiated in response to Wnt9b secreted from the ureteric epithelium. 
In the mouse, ectopic nephron formation can be triggered via the 
addition of canonical Wnt agonists'*. Indeed, maximal nephron num- 
ber per organoid required a pulse of CHIR99021 for one hour after 
forming a pellet (Fig. 2a and Extended Data Fig. 3a). In addition, the 
continued presence of FGF9 after this CHIR99021 pulse was essential 
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Figure 1 | Selective induction of either the collecting duct or kidney 
mesenchyme lineage. a, Schematic illustrating the mechanism of 
anterior—posterior (A-P) patterning of the intermediate mesoderm in the 
embryogenesis’. The timing of PSM cell migration determines the timing of 
the exposure to FGF9 and RA, resulting in fate selection between anterior 
intermediate mesoderm and posterior intermediate mesoderm. AI, anterior 
intermediate mesoderm; MM, metanephric mesenchyme; PI, posterior 
intermediate mesoderm; PSM, presomitic mesoderm; UE, ureteric epithelium. 
b, Schematic of three experimental timelines. CHIR, CHIR99021. c, Time 
course quantitative PCR of an initial 7 days (d) of the differentiation from the 
above timings. Experiments were conducted using monolayer culture 
condition (mean + s.d., n = 3 independent experiments). d, Immunofluo- 
rescence at day 7 of differentiation with the AI marker, GATA3, and the PI 


for nephrogenesis, suggesting an additional role for FGF signalling 
after Wnt-mediated nephron induction (Extended Data Fig. 3b). 
Within each organoid, the nephrons appropriately segmented into 4 
components, including the collecting duct (GATA3'ECAD*), the 
early distal tubule (GATA3 LTL” ECAD*), early proximal tubule 
(LTL*ECAD_ ) and the glomerulus (WT1*) (Fig. 2c, d). Moreover, 
kidney organoids showed complex morphogenetic patterning with 
collecting duct trees forming at the bottom of the organoid, connecting 
to distal and proximal tubules in the middle, with the glomeruli at the 
top of each organoid (Fig. 2e and Supplementary Videos 1 and 2). This 
patterning mimics the tissue organization observed in vivo where 
glomeruli arise in the cortex whereas the collecting ducts radiate 
through the organ from the middle. Here again, the relative level of 
collecting duct versus nephron within individual organoids could be 
varied with the timing of the initial CHIR99021-to-FGF9 switch 
(Extended Data Fig. 4a, b). Next, we performed RNA sequencing of 
whole kidney organoids at day 0, 3, 11 and 18 after aggregation and 3D 
culture. Across this time course we observed a temporal loss of 
nephron progenitor gene expression but an increase in markers of 
multiple nephron segments, including the podocytes, proximal and 
distal tubules (Extended Data Fig. 5 and Supplementary Table 2). 
Transcriptional profiling was performed and compared using an 
unbiased method with human fetal transcriptional data sets from 
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marker, HOXD11. Scale bars, 100 jum. Experimental replicates, 3. e, Schematic 
illustrating RA signalling after the primitive streak stage. An RA-metabolizing 
enzyme, CYP26, is expressed in the PSM region to shield PSM cells from 

RA signalling. f, Schematic of three experimental timelines. RA or AGN193109 
(AGN) were added with FGF9 after CHIR99021, followed by growth 

factor withdrawal (no GFs). Experiments were conducted with monolayer 
culture condition. g, Immunofluorescence at day 18 of differentiation from 

3 days of CHIR99021 followed by + RA/AGN. AGN inhibited the AI 
specification of early migrating cells, causing posteriorization. At day 18, 
GATA3 and HOXD11 mark the UE and the MM, respectively (left panels). 
GATA3*PAX2*ECAD* cells represent the UE whereas GATA3, PAX2* 
cells do the MM (ECAD_ ) and its derivatives (ECAD*) (right panels). 
Experimental replicates, 3. Scale bars, 100 lum. 


21 human fetal organs/tissues from the first and/or second trimester 
of pregnancy’. This analysis clustered kidney organoids at d11 and 
d18 of culture with first trimester human fetal kidney (Fig. 2f, g and 
Extended Data Fig. 6). At the earlier culture time points (day 0 and 3), 
organoids more closely matched the fetal gonad, an embryologically 
closely related tissue also derived from the intermediate mesoderm. 
In a kidney, the epithelial cell types (nephron and collecting duct) 
are surrounded by a renal interstitium (stroma) within which there is a 
vascular network. As well as forming the metanephric mesenchyme, 
the intermediate mesoderm gives rise to stromal and vascular progeni- 
tors (Fig. 3a)'*’”. We examined kidney organoids for evidence of addi- 
tional cell types and evidence of functional maturation. Collecting 
ducts could be distinguished based on co-expression of PAX2, 
GATA3 and ECAD (Fig. 3b). At d11, nephron epithelia showed prox- 
imal (LTL*ECAD_ ) and distal (LTL ECAD*) elements (Fig. 3c). By 
day 18, proximal tubules matured to co-express LTL with ECAD, with 
cubilin evident on the apical surface (Fig. 3d, e). Transmission electron 
microscopy (TEM) showed distinct epithelial subtypes; cells with few 
short microvilli surrounding an open lumen characteristic of collecting 
duct/distal tubule (Fig. 3k) and typical proximal tubular epithelium 
displaying an apical brush border with tight junctions (Fig. 31). By day 
18, loops of Henle (UMOD*) began to form (Fig. 3f). By day 11, 
WTI1*NPHS1" early glomeruli’ comprising a Bowman’s capsule 
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Figure 2 | Generating a kidney organoid equivalent to the human fetal 
kidney in vitro. a, Schematic of the differentiation protocol from hPSCs. 

8 CHIR, 8 uM CHIR99021; 200 FGF9, 200 ngml! FGF9. b, Global bright field 
observations of self-organizing kidney organoids across a time series. The 
success rate of organoid differentiation was 94.2% (138 organoids, 5 
experiments). Scale bars, 1 mm. ¢, Tile scan immunofluorescence of a whole 
kidney organoid displaying structural complexity. Scale bar, 1 mm. d, High- 
power immunofluorescence microscopy showing a nephron segmented into 
4 compartments, including the collecting duct (CD, GATA3‘ ECAD‘“), distal 
tubule (DT, GATA3” ECAD*LTL_ ), proximal tubule (PT, ECAD™ LTL*) and 
the glomerulus (G, WT1/"). Scale bar, 100 tum. e, Confocal microscopy 
generating serial z-stack images from the bottom to the top of a day 11 kidney 
organoid (Supplementary Videos 1 and 2). Schematic illustrates the position of 
different structures within an organoid. Top, middle and bottom images are 


with central podocyte formation was seen connected to proximal 
tubules (Fig. 3g). Kidney organoids also developed a CD31~° 
KDR*SOX17~ endothelial network with lumen formation (Fig. 3h 
and Extended Data Fig. 7a, b, c). TEM showed the presence of primary 
and secondary foot processes characteristic of podocytes (Fig. 3m). Ina 
developing kidney, renal interstitium differentiates into pericytes and 
mesangial cells'®. As expected, kidney organoids contained PDGFRA* 
perivascular cells that lie along KDR endothelia and PDGFRA * early 
mesangial cells invaginating the glomeruli, as observed in human fetal 
kidney” (Extended Data Fig. 8a, b). Early avascular glomeruli con- 
tained basement membrane, as indicated by laminin staining and 
TEM, and showed attaching foot processes on the basement mem- 
brane (Extended Data Fig. 8c, d). In some instances, glomeruli showed 
evidence of endothelial invasion (Fig. 3i and Supplementary Videos 3 
and 4), a feature never observed in explanted embryonic mouse kid- 
neys”!, Finally, nephrons were surrounded by MEIS1 * renal interstitial 
cells”, some of which were also FOXD1° (Fig. 3j and Extended Data 
Fig. 8e), suggesting the presence of cortical (FOXD1* MEIS1*) and 
medullary (FOXD1 MEIS1") stroma. Hence, all anticipated kidney 
components form, pattern and begin to mature within these hPSC- 
derived kidney organoids. Consistent with these observations were 
the transcriptional changes across time in culture, with a gradual reduc- 
tion in the nephrogenic mesenchyme and ureteric tip markers followed 
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representative images taken through the organoids at the position indicated in 
e. Each segment of the nephron is marked (or coloured in schematic) as 
described below: collecting ducts, GATA3’ ECAD" (green dots in yellow); 
distal tubules, ECAD* (yellow); proximal tubules, LTL* (red); glomeruli, 
NPHS1 (green circles). Scale bars, 100 jim. f, Heat map visualizing the relative 
transcriptional identity (score from 0 to 1 determined using the KeyGene 
algorithm'*) of kidney organoids to 13 human fetal tissues. RNA-seq was 
performed on whole kidney organoids from 4 time points (day 0, 3, 11 and 18 
after aggregation) with 3 individual organoids from 1 experiment per time 
point (see Supplementary Table 2). g, A dendrogram showing the hierarchical 
clustering of day 0, 3, 11 and 18 kidney organoids with human fetal organs 
from both first trimester and second trimester, based on 85 key genes 
(Supplementary Table 3) previously defined’*. This clearly shows a close 
match with trimester 1 fetal kidney from day 11 and 18 of culture. 


by the upregulation of genes specific to podocyte, proximal tubule, 
distal tubule and loop of Henle”*.(Extended Data Fig. 5). 

The utility of stem-cell derived kidney organoids for disease modelling 
or drug screening will be dependent upon the functional maturation of 
the nephrons within these organoids. To test this, we focused on the 
proximal tubules, a nephron segment that has important roles in solute, 
vitamin, hormone and amino acids reabsorption. The capacity of cubi- 
lin-mediated proximal tubule specific endocytosis was demonstrated by 
the selective uptake of dextran—Alexa488 from the media by the LTL™ 
tubules after 24 h of exposure (Fig. 4a and Extended Data Fig. 9a, b). The 
proximal tubules represent a particular target for nephrotoxicity due to 
the expression of multidrug resistance (such as ABCB1, ABCG2) and 
anion and cation transporters (such as the SLC22 gene family)”. 
Cisplatin is one such nephrotoxicant that induces caspase-mediated 
acute apoptosis of proximal tubular cells in the kidney””*. We treated 
kidney organoids with 0, 5 and 20 uM cisplatin for 24 h before examining 
cleaved-CASP3 antibody staining (Extended Data Fig. 9c). While control 
organoids showed occasional apoptotic interstitial cells, both 5 uM and 
20 uM cisplatin induced specific acute apoptosis in mature proximal 
tubular cells (LTLTECAD*), whereas immature cells (LTL*ECAD  ) 
did not undergo apoptosis (Fig. 4b, c). 

In summary, this study demonstrates that by carefully balancing 
anterior—posterior patterning of intermediate mesoderm with small 
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Figure 3 | Kidney organoids contain differentiating nephrons, stroma and 
vasculature with progressive maturation with time in culture. a, Schematic 
illustrating the developmental pathway from intermediate mesoderm (IM) to 
each cellular component of the kidney. CD, collecting ducts; DT, distal tubules; 
LoH, loops of Henle; PT, proximal tubules; POD, podocytes; VASC, vasculature; 
STROM, renal interstitium. b-j, Immunofluorescence of kidney organoids at 
either day 11 or 18. b, Collecting ducts marked by PAX2, GATA3 and ECAD. 
Scale bar, 50 jim. ¢, d, Early proximal tubules of LTL*ECAD™ at day 11 

(black arrowheads). LTLECAD* maturing proximal tubules appear by day 18 
(white arrowheads). Scale bars, 100 jim. e, Proximal tubules express cubilin 
(CUBN). Scale bar, 50 jum. f, Loops of Henle marked by UMOD and ECAD. 


molecules it is possible to direct human pluripotent stem cells to forma 
complex multicellular kidney organoid that comprises fully segmented 
nephrons surrounded by endothelia and renal interstitium and is tran- 
scriptionally similar to a human fetal kidney. As such, these will 
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Figure 4 | Functional maturation of the proximal tubule. a, Dextran uptake 
assay showing endocytic ability of LTL* tubules. Scale bar, 50 pm. b, Treating 
kidney organoids with 20 uM cisplatin caused apoptosis in LTL*ECAD* 
proximal tubular cells. Apoptotic cells were detected by cleaved caspase 3 
antibody-staining (CASP3). Scale bars, 100 um. c, Quantification of the 
number of apoptotic tubules showing mature proximal tubules-specific 
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and NPHS1. Scale bar, 50 um. h, CD31 * endothelia within the renal interstitium. 
Scale bar, 200 tum. i, Evidence of endothelial invasion into glomeruli at day 18 
of culture. Scale bar, 50 1m. j, The kidney interstitium marked by MEIS1. 

Scale bar, 100 jim. k-m, Transmission electron microscopy of kidney organoids. 
k, A putative distal tubule with relatively sparse short microvilli (m) and tight 
junctions (tj). 1, A putative proximal tubule with a lumen filled with extensive 
closely packed microvilli characteristic of the brush border (bb). m, Podocytes (p) 
with characteristic large nuclei and primary (pf) and secondary foot (sf) 
processes. Data are representative from a minimum of 3 independent 
experiments. 


improve our understanding of human kidney development. Each kid- 
ney organoid reaches a substantial size with more than 500 nephrons 
per organoid, a number equivalent to a mouse kidney at 14.5 days post- 
coitum’’. While there is room for further improvement with regard to 
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apoptosis by a nephrotoxicant, cisplatin. In response to 5 uM and 20 uM 
cisplatin, LTL*ECAD* mature proximal tubules (PT) underwent apoptosis 
dose-dependently. In contrast, LTL* ECAD™ immature PT did not respond to 
cisplatin. P values were calculated by independent t-test (mean + s.e.m., n = 5 
independent experiments); NS, not significant. 
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tubular functional maturity, glomerular vascularisation and a contigu- 
ous collecting duct epithelium with a single exit path for urine, the 
tissue complexity and degree of organoid functionalization observed 
here supports their use to screen drugs for toxicity, modelling 
genetic kidney disease or act as a source of specific kidney cell types 
for cellular therapy. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size, the experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cell culture and differentiation. All experiments presented used the previously 
described wild-type human iPS cell line CRL1502 (clone C32) generated using 
episomal reprogramming”. Undifferentiated human iPS cells were maintained on 
the mouse embryonic fibroblasts (MEFs) (Millipore) as a feeder layer with human 
ES cell (hES) medium as described previously’. Cells were authenticated and tested 
for the mycoplasma infection’*. Human iPS cells were plated on a Matrigel-coated 
(Millipore) culture dish and cultured in MEF-conditioned hES medium (MEF- 
CM) until reaching 60-100% confluence. Then, cells were again plated on a 
Matrigel-coated at 5,000 cells percm* in MEF-CM. Next day, cells reached 
40-50% of confluence, cells were treated with 8 1M CHIR99021 in APEL basal 
medium (STEMCELL Technologies) supplemented with Antibiotic-Antimycotic 
(Life Technologies) for 2-5 days, followed by FGF9 (200 ng ml!) and heparin 
(11g ml~') for another 5-2 days, with changing medium every second day. At 
day 7, cells were collected and dissociated into single cells using trypsin or TrypLE 
select (Life Technologies). Cells (0.5 X 10°) were spun down at X 400g for 2 min to 
form a pellet and then transferred onto a Transwell 0.4 um pore polyester mem- 
brane (CLS3450 Corning). Pellets were treated with 5 1M CHIR99021 in APEL for 
1h, and then cultured with FGF9 (200ngml~') and heparin (1 pg ml) for 
5 days, followed by another 6-13 days in APEL basal medium, with changing 
medium three times a week. Culture medium should not overflow over the mem- 
brane. For the differentiation in monolayer cultures, cells after CHIR99021 induc- 
tion were treated by FGF9 (200 ng ml!) and heparin (1 pg ml ') for 10 days, 
followed by APEL basal medium for another 6 days. In some experiments, 
RA (0.1 uM) or AGN193109 (5 UM) were added to FGF9 medium. A step-by step 
protocol describing kidney organoid generation can be found at Protocol 
Exchange”. 

Immunocytochemistry. For monolayer cells, antibody staining was performed as 
described previously’. For the kidney organoid, organoids were fixed with 
2% paraformaldehyde in PBS for 20 min at 4°C followed by 3 times wash with 
PBS. Then organoids were blocked with 10% donkey serum, 0.3% Triton X/PBS 
for 2-3 h at room temperature and incubated with primary antibodies overnight at 
4°C. After 5 times washing with 0.1% Triton X/PBS, secondary antibodies were 
incubated for 4h at room temperature. The following antibodies and dilutions 
were used: rabbit anti-PAX2 (1:300, 71-6,000, Zymed Laboratories), goat anti- 
SIX1 (1:300, sc-9709, Santa Cruz Biotechnology), rabbit anti-SIX2 (1:300, 11562- 
1-AP, Proteintech), mouse anti-ECAD (1:300, 610181, BD Biosciences), rabbit 
anti-WT1 (1:100, sc-192, Santa Cruz Biotechnology), mouse anti-HOXD11 
(1:300, SAB1403944, Sigma-Aldrich), goat anti-GATA3 (1:300, AF2605, R&D 
Systems), rabbit anti-JAG1 (1:300, ab7771, Abcam), goat anti-cubilin (1:150, 
sc-20607, Santa Cruz Biotechnology), sheep anti- NPHS1 (1:300, AF4269, R&D 
Systems), LTL-biotin-conjugated (1:300, B-1325, Vector Laboratories), DBA- 
biotin-conjugated (1:300, B-1035, Vector Laboratories), mouse anti-KRT8 
(1:300, TROMA, DSHB), mouse anti-CD31 (1:300, 555444, BD Pharmingen), 
rabbit anti-KDR (1:300, 2479, Cell Signaling Technology), goat anti-SOX17 
(1:300, AF1924, R&D Systems), mouse anti-PDGFRA (1:200, 556001, BD 
Pharmingen), rabbit anti-Laminin (1:300, L9393, Sigma-Aldrich), rabbit anti- 
UMOD (1:300, BT-590, Biomedical Technologies), mouse anti-MEIS1 (1:300, 
ATM39795, activemotif), goat anti-FOXD1 (1:200, sc-47585, Santa Cruz 
Biotechnology) and rabbit anti-cleaved-CASP3 (1:300, 9661, Cell Signaling 
Technology). Images were taken using a Nikon Ti-U microscope or a Zeiss 
LSM 780 confocal microscope. Allimmunofluorescence analyses were successfully 
repeated more than three times and representative images are shown. 

Electron microscopy. Organoids were processed for electron microscopy using a 
method as follows. A solution of 5% glutaraldehyde in 2 X PBS was added directly 
to the organoid culture dish in equal volume to the growth medium and placed 
under vacuum for 5 min. The organoid was reduced in size by cutting into small 
blocks (~2 X 2 mm), and irradiated in fresh fixative 2.5%, again under vacuum, 
for 6 min, in a Pelco Biowave (Ted Pella In, Redding, CA) at 80 W power. Samples 
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were then washed 42min in 0.1M cacodylate buffer. Samples were then 
immersed in a solution containing potassium ferricyanide (3%) and osmium 
tetroxide (2%) in 0.1 M cacodylate buffer for 30min at room temperature. 
Following 6 X 3 min washes in distilled water the tissue blocks were then incu- 
bated in a filtered solution containing thiocarbohydrazide (1%) for 30 min at room 
temperature. After subsequent washing in distilled water (6 X 2 min) samples 
were incubated in an aqueous solution of osmium tetroxide (2%) for 30 min, then 
in distilled water (6 X 2 min) and incubated in 1% aqueous uranyl acetate for 
30 min at 4 °C. After further distilled water washes (2 X 2 min) a freshly prepared 
filtered solution of 0.06% lead nitrate in aspartic acid (pH 5.5) warmed to 60 °C 
was added to the dish and further incubated for 20 min at 60 °C before rinsing in 
distilled water (6 X 3 min) at room temperature. Tissue blocks were dehydrated 
twice in each ethanol solution of 30%, 50%, 70%, 90% and absolute ethanol for 40 s 
at 250 W in the Pelco Biowave. Epon LX112 resin was used for embedding the 
tissue with infiltration at 25%, 50%, and 75% resin:absolute ethanol in the Pelco 
Biowave under vacuum at 250 W for 3 min and finishing with 100% resin (twice), 
before the final embedding/blocking and curing at 60 °C for 12h. 

qRT-PCR analysis. Total RNA was extracted from cells using Purelink RNA mini 
kit (Life Technologies) and cDNA was synthesized from >100 ng total RNA using 
Super Script III reverse transcriptase (Life Technologies). (RT-PCR analyses were 
performed with GoTaq qPCR Master Mix (Promega) by Roche LightCycler 96 
real-time PCR machine. All absolute data were first normalized to GAPDH and 
then normalized to control samples (AAC, method). The sequences of primers 
used for qRT-PCR are as listed in Supplementary Table 1. 

Next generation RNA sequencing and comparative analysis using KeyGenes. 
Sequencing was performed using the Illumina NextSeq500 (NextSeq control soft- 
ware v1.2/Real Time Analysis v2.1) platform. The library pool was diluted and 
denatured according to the standard NextSeq500 protocol and sequencing was 
carried out to generate single-end 76 bp reads using a 75 cycle NextSeq500 High 
Output reagent Kit (Catalog FC-404-1005). Reads were mapped against the ref- 
erence human genome (hg19) using STAR”, and read counts for each gene in the 
UCSC annotation were generated using htseq-count in the HTSeq python package 
(http://www-huber.embl.de/users/anders/HTSeq/doc/index.html). The number 
of uniquely mapped reads ranged from 18,810,634 to 36,706,805 per sample. 
Normalized read counts were calculated using the DESeq2 package*’. 

KeyGenes was used to generate the identity scores of day 0, 3, 11 and 18 kidney 

organoids to different first trimester human organs, including the kidneys 
(GSE66302)'°. The dendrogram showing the hierarchical clustering of day 0, 3, 
11 and 18 kidney organoids and 21 human fetal organs from first and second 
trimester (GSE66302) was based on the Pearson correlation of the expression 
levels of 85 classifier genes as determined by KeyGenes (http://www.keygenes.nl) 
(Supplementary Table 3). The classifier genes were calculated by KeyGenes using 
the top 500 most differentially expressed genes of the human fetal data without 
including the extraembryonic tissues from that data set. 
Functional analysis for proximal tubules. For dextran uptake assay, organoids at 
day 17 were cultured with 10 jig ml! of 10,000 MW dextran Alexa488-conjugated 
(D-22910, Life Technologies) for 24h. Organoids were fixed and stained by LTL 
without permeabilization. For nephrotoxicity assays, organoids at day 17 were 
cultured with 0, 5, 20 or 100 1M cisplatin (Sigma-Aldrich) for 24h. The ratio of 
apoptotic proximal tubules to total proximal tubules was manually counted using 
ImageJ in 2 or 3 representative fields per experiment. In total, n = 5 independent 
experiments. Images were taken using Zeiss LSM 780 confocal microscope. 


28. Briggs, J.A. etal. Integration-free induced pluripotent stem cells model genetic and 
neural developmental features of down syndrome etiology. Stem Cells 31, 
467-478 (2013). 
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Extended Data Figure 1 | Antero-posterior intermediate mesoderm 
specification is regulated by the timing of FGF9 exposure and the 
presence of RA signalling. a, Immunofluorescence at day 18 of monolayer 
differentiation from cultures exposed to different timing of FGF9 addition 
(after 2, 3, 4 and 5 days of CHIR99021). The ureteric epithelium is represented 
by GATA3*PAX2*ECAD* cells. The metanephric mesenchyme and its 
derivatives are marked by PAX2*GATA3 ECAD  (metanephric 
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mesenchyme) and PAX2*GATA3 ECAD* (nephrons), respectively. Scale 
bars, 100 um. b, Immunofluorescence at day 7 and 18 of monolayer 
differentiation using 5 days of CHIR99021 followed by RA or AGN193109 
(AGN) on top of FGF9. RA reduced the specification of posterior intermediate 
mesoderm, as indicated by the reduction of HOXD11 at day 7 (top panel). 
This resulted in less metanephric mesenchyme but some ureteric 

epithelium by RA at day 18 (bottom panel). Scale bars, 100 um. 
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Extended Data Figure 2 | Induction of both kidney progenitors at the same — mesenchyme is marked by SIX2*SIX1* HOXD11* cells (a). GATA3~ 
time. a, b, Immunofluorescence at day 18 of the monolayer differentiation PAX2*ECAD* KRTS8" cells representing the ureteric epithelium were also 
using the 4 days CHIR99021 before FGF9 protocol. The metanephric induced (b). Scale bars, 50 tm. 
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Extended Data Figure 3 | Regulation of nephrogenesis in the kidney numbers of nephrogenesis events happened without CHIR99021 (no pulse). 
organoid. a, Stimulating organoids with 5 1M CHIR99021 for 1 himmediately Scale bars, 1 mm. b, Without the addition of FGF9 after this CHIR99021 pulse, 
after aggregation promoted nephrogenesis (CHIR pulse), whereas only limited —_ organoids did not initiate nephrogenesis (— FGF9). Scale bars, 200 um. 
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Extended Data Figure 4 | The timing of FGF9 exposure affects the ratio of | demonstrating the regulation of collecting duct/nephron ratio by varying this 
collecting duct to nephron in the kidney organoid. a,b, Immunofluorescence _ timing. GATA3” ECAD* cells represent the collecting duct (a), whereas 

of kidney organoids at day 18 after-aggregation after exposure to different WT1° NPHS1* cells mark podocytes of the glomerulus (b). Scale bars, 200 pm. 
timings of initial FGF9 exposure (2, 3, 4 and 5 days of CHIR99021 pre-FGF9), 
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Extended Data Figure 5 | Changes of gene expression during development _(a). Markers of early nephron increased by day 3, while those of mature 
of the kidney organoid. a—c, Graphs showing expression changes of selected nephron components (Proximal and distal tubule and Podocytes) started after 


marker genes at 4 time points (day 0, 3, 11 and 18) of the kidney organoid day 3. Illustrations show expression regions (blue coloured) of each selected 
culture. y axis represents the count of detection for each gene in an RNA gene in the developing kidney (b). Markers of endothelial and renal interstitial 
sequencing analysis. Markers of the nephron progenitor (cap mesenchyme) cells were also increased by day 11 (c). 


and collecting duct progenitor (ureteric tip) were peaked by day 3 then dropped 
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Extended Data Figure 6 | Transcriptional similarity of the kidney organoid 
to human fetal organs. Dendrogram showing the hierarchical clustering of 
day 0, 3, 11 and 18 differentiation experiments and 21 human fetal organs from 
first and second trimester (Gene Expression Omnibus accession number 
GSE66302)'°. Sample name is composed of individual ID followed by an organ 
name and gestation week. For instance, “DJ1 kidney_9’ represents a kidney at 
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ninth week gestation from individual ID: DJ1. Day 0 and 3 kidney organoids 
cluster with gonad, in agreement with the common origin of both gonad and 
kidney from the intermediate mesoderm. Day 11 and 18 kidney organoids 
show strongest similarity to trimester 1 human kidney. The classifier genes used 
for this analysis are detailed in Supplementary Table 3. 
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Extended Data Figure 7 | Evidence of endothelial cells in the kidney 
organoid. a, Immunofluorescence of day 11 kidney organoids showing the 
presence of CD31*KDR™ endothelial cells surrounding NPHS1* glomeruli. 
Scale bar, 100 jum. b, Two representative images demonstrating the expression 


b CD31 SOX17 DAPI 
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of another endothelium marker SOX17 in CD31~ endothelial cells. Scale bars, 
100 um. c, Immunofluorescence of day 18 kidney organoids displaying 
endothelia with lumen formation, as indicated by asterisks. This image also 
shows the endothelial invasion into a glomerulus. Scale bar, 100 um. 
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Extended Data Figure 8 | Characterization of non-epithelial structures in 
the kidney organoid. All images were taken from day 18 kidney organoids. 
a, PDGFRA* pericytic cells attaching on KDR’ vessels. Scale bar, 50 lum. 

b, Some glomeruli contained PDGFRA* cells likely to represent early 
mesangial cells’’. Scale bar, 50 um. c, Laminin staining (LAM) demonstrates the 
presence of basement membrane in glomerulus structures (white arrowheads). 
Scale bar, 100 jum. d, TEM images of avascular glomeruli showing early 


podocytes surrounding a basement membrane (yellow arrowheads) and 
exhibiting foot processes on the basement membrane. e, Immunofluorescence 
showing FOXD1 expression in podocytes (WT1* FOXD1")"* and a 
subpopulation of MEIS1™ interstitium (white arrowheads). This is suggestive 
of the presence of both cortical stroma (FOXD1* MEIS1*) and medullary 
stroma (FOXD1 MEIS1~). Scale bar, 100 lum. 
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Extended Data Figure 9 | Functional assay of proximal tubule maturation 
within kidney organoids. a, Fluorescent microscopy showing the dextran 
uptake in both the kidney organoids and E14 mouse embryonic kidneys organ 
culture after 24h presence of dextran—Alexa488 (10 pg ml ') in the culture 
medium (24h dextran—Alexa488). 1 h incubation was insufficient for either 
organoids or mouse kidney explants to uptake dextran from the culture media 
(1h dextran—Alexa488). No background signals were detected in a control 
without dextran (no dextran). Dashed line circles the organoids and kidneys. 
Scale bars, 1 mm. b, Endocytosis mediator cubilin (CUBN) was present on 
apical surface of the proximal tubules in kidney organoids (left panel). The 
same staining without detergent during the process showed the complete 


absence of CUBN staining on apical surface (right panel), demonstrating that 
the tubules within the organoids are intact. This explains the requirement 

for a 24h incubation with dextran before evidence of apical uptake. Dashed 
line circles LTL* proximal tubules. Scale bars, 50 jum. c, Low power 
immunofluorescence microscopy of day 18 kidney organoids after being 
treated by cisplatin for 24h. No apoptosis was observed in proximal tubules 
in the absence of cisplatin (0 11M, left panel). LTLECAD* proximal tubular 
cell-specific apoptosis was observed only in response to either 5 1M (not 
shown) or 20 tM cisplatin (arrowheads in middle panel). Global cell death was 
observed after culture in 100 LM cisplatin (right panel). Scale bars, 100 jum. 
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A comprehensive phylogeny of birds (Aves) using 
targeted next-generation DNA sequencing 


Richard O. Prum!?*, Jacob S. Berv**, Alex Dornburg!'*4, Daniel J. Field”, J effrey P. Townsend?®, 


Emily Moriarty Lemmon’ & Alan R. Lemmon® 


Although reconstruction of the phylogeny of living birds has pro- 
gressed tremendously in the last decade, the evolutionary history of 
Neoaves—a clade that encompasses nearly all living bird species— 
remains the greatest unresolved challenge in dinosaur systematics. 
Here we investigate avian phylogeny with an unprecedented scale 
of data: >390,000 bases of genomic sequence data from each of 
198 species of living birds, representing all major avian lineages, 
and two crocodilian outgroups. Sequence data were collected using 
anchored hybrid enrichment, yielding 259 nuclear loci with an 
average length of 1,523 bases for a total data set of over 7.8 X 10” 
bases. Bayesian and maximum likelihood analyses yielded highly 
supported and nearly identical phylogenetic trees for all major 
avian lineages. Five major clades form successive sister groups to 
the rest of Neoaves: (1) a clade including nightjars, other caprimul- 
giforms, swifts, and hummingbirds; (2) a clade uniting cuckoos, 
bustards, and turacos with pigeons, mesites, and sandgrouse; (3) 
cranes and their relatives; (4) a comprehensive waterbird clade, 
including all diving, wading, and shorebirds; and (5) a compre- 
hensive landbird clade with the enigmatic hoatzin (Opisthocomus 
hoazin) as the sister group to the rest. Neither of the two main, 
recently proposed Neoavian clades—Columbea and Passerea’— 
were supported as monophyletic. The results of our divergence 
time analyses are congruent with the palaeontological record, sup- 
porting a major radiation of crown birds in the wake of the 
Cretaceous—Palaeogene (K-Pg) mass extinction. 

Birds (Aves) are the most diverse lineage of extant tetrapod verte- 
brates. They comprise over 10,000 living species”, and exhibit an extra- 
ordinary diversity in morphology, ecology, and behaviour’. Substantial 
progress has been made in resolving the phylogenetic history of birds. 
Phylogenetic analyses of both molecular and morphological data sup- 
port the monophyletic Palaeognathae (the tinamous and flightless 
ratites) and Galloanserae (gamebirds and waterfowl) as successive, 
monophyletic sister groups to the Neoaves—a diverse clade including 
all other living birds*. Resolving neoavian phylogeny has proven to bea 
difficult challenge because this radiation was very rapid and deep in 
time, resulting in very short internodes’. 

In the last decade, phylogenetic analyses of large, multilocus data 
sets have resulted in the proposal of numerous, novel neoavian rela- 
tionships. For example, a clade consisting of diving and wading birds 
has been consistently recovered, as well as a large landbird clade in 
which falcons and parrots are successive sister groups to the perching 
birds**. Recently, phylogenetic analyses of 48 whole avian genomes 
resulted in the proposal of a novel phylogenetic resolution of the initial 
branching sequence within Neoaves'. Although this genomic study 
provided much needed corroboration of many neoavian clades, the 
limited taxon sampling precluded further insights into the evolution- 
ary history of birds. 


It has long been recognized that phylogenetic confidence depends 
not only on the number of characters analysed and their rate of evolu- 
tion, but also on the number and relationships of the taxa sampled 
relative to the nodes of interest”""’. Theory predicts that sampling a 
single taxon that diverges close to a node of interest will have a far 
greater effect on phylogenetic resolution than will adding more char- 
acters''. Despite using an alignment of >40 million base pairs, sparse 
sampling of 48 species in the recent avian genomic analysis may not 
have been sufficient to confidently resolve the deep divergences among 
major lineages of Neoaves. Thus, expanded taxon sampling is required 
to test the monophyly of neoavian clades, and to further resolve the 
phylogenetic relationships within Neoaves. 

Here, we present a phylogenetic analysis of 198 bird species and 
2 crocodilians (Supplementary Table 1) based on loci captured using 
anchored enrichment’”. Our sample includes species of 122 avian 
families in all 40 extant avian orders’, with denser representation of 
non-oscine birds (108 families) than of oscine songbirds (14 families). 
Effort was made to include taxa that would break up long phylogenetic 
branches, and provide the highest likelihood of resolving short inter- 
nodes at the base of Neoaves’’. We also sampled multiple species 
within groups whose monophyly or phylogenetic interrelationships 
have been controversial—that is, tinamous, nightjars, hummingbirds, 
turacos, cuckoos, pigeons, sandgrouse, mesites, rails, storm petrels, 
petrels, storks, herons, hawks, hornbills, mousebirds, trogons, king- 
fishers, barbets, seriemas, falcons, parrots, and suboscine passerines. 

We targeted 394 loci centred on conserved anchor regions of the 
genome that are flanked by more variable regions'*. We performed all 
phylogenetic analyses on a data set of 259 genes with the highest 
quality assemblies. The average locus was 1,524 bases in length 
(361-2,316 base pairs (bp)), and the total percentage of missing data 
was 1.84%. The concatenated alignment contained 394,684 sites. To 
minimize overall model complexity while accurately accounting for 
substitution processes, we performed a partition model sensitivity 
analysis with PartitionFinder’*'*, and compared a complex partition 
model (one partition per locus) to a heuristically optimized (rclust) 
partition model. Phylogenetic informativeness (PI) approaches’*’® 
provided strong evidence that the phylogenetic utility of our data set 
was high, with low declines in PI profiles for individual loci, data set 
partitions, and the concatenated matrix (Supplementary Fig. 4). We 
estimated concatenated trees in ExaBayes’” and RAxML" using a 75 
partition model. Coalescent species trees were estimated with the gene 
tree summation methods in STAR”, NJst’°, and ASTRAL” from gene 
trees estimated with RAxML (see Methods.) 

Our concatenated Bayesian analyses resulted in a completely 
resolved, well supported phylogeny. All clades had a posterior prob- 
ability (PP) of 1, except for a single clade including shoebill 
(Balaeniceps) and pelican (PP = 0.54) (Fig. 1). The concatenated 
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Figure 1 | Phylogeny of birds. Time-calibrated phylogeny of 198 species of 
birds inferred from a concatenated, Bayesian analysis of 259 anchored 
phylogenomic loci using ExaBayes’’. Figure continues on the opposite page 
from green arrow at the bottom of this panel. Complete taxon data in 
Supplementary Table 1. Higher taxon names appear at right. All clades are 
supported with posterior probability (PP) of 1.0, except for the Balaeniceps- 
Pelecanus clade (PP = 0.54; clade 109). The five major, successive, neoavian 
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sister clades are: Strisores (brown), Columbaves (purple), Gruiformes (yellow), 
Aequorlitornithes (blue), and Inopinaves (green). Background colours mark 
geological periods. Ma, million years ago; Ple, Pleistocene; Pli, Pliocene; 

Q., Quaternary. Clade numbers refer to the plot of estimated divergence 
dates (Supplementary Fig. 7). Fossil age-calibrated nodes are shown in grey. 
Illustrations of representative bird species* are depicted by their lineages. See 
Supplementary Information for details and further discussion. 
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Figure 1 | Continued. 


maximum likelihood analysis recovered a single topology that was _ tree were maximally supported with bootstrap scores (BS) of 1.00, but 
identical to the Bayesian tree except for three clades, all of which are nine clades within Neoaves (including four of the most inclusive 
far from the base of Neoaves: the relationships among pigeons; among neoavian clades) received support <0.70 (Supplementary Fig. 1). 
skimmers, gulls, and terns; and among pelicans, shoebill, and waders Coalescent species tree analyses produced substantially different 
(Supplementary Fig. 1). Almost all clades in the maximum likelihood hypotheses for neoavian relationships (Supplementary Fig. 3), but 
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most of the discordant clades received conspicuously lower bootstrap 
support values (0.07 <BS<0.30). Quantifying the phylogenetic 
informativeness of individual loci'*’* revealed that these low support 
values were not due to homoplasy driven by saturation of nucleotide 
states, but rather by the low power of individual loci to resolve the 
entire range of internode lengths across the depth of the tree 
(Supplementary Figs 4 and 5; see Methods). This result was not unex- 
pected. The low phylogenetic information content of individual genes 
at deep timescales has been demonstrated to impede phylogenetic 
resolution in a coalescent species tree framework’”*. Furthermore, 
when clades with <0.75 bootstrap support values in the species trees 
are collapsed, the resulting topology is exactly congruent with the con- 
catenated Bayesian tree (except for the relationships of tinamous 
among palaeognaths; Supplementary Fig. 3). Although coalescent spe- 
cies trees account for incomplete lineage sorting, simulations show that 
species tree methods based on gene tree summation may not provide 
significantly better performance over concatenation methods”. 

Our phylogeny identifies many new clades, and supports many 
phylogenetic relationships proposed in previous studies (see detailed 
phylogenetic discussion in the Supplementary Information). 
Congruent with all recent studies, the phylogeny places palaeognaths 
as the sister group to the rest of birds, and the flying tinamous 
(Tinamidae) within the flightless ratites. This tree, however, places 
tinamous as the sister group to cassowary and emu alone (Fig. 1, grey). 
The phylogeny of Galloanserae is exactly congruent with previous 
studies* (Fig. 1, red). 

Within the monophyletic Neoaves, we recover five major clades, 
each of which is the successive sister group to the remaining clades in 
the series (Fig. 1). The Strisores includes the nightjars and their noc- 
turnal relatives with the diurnal swifts and hummingbirds (Fig. 1, 
brown). Four nocturnal lineages—nightjars, a neotropical oilbird- 
potoo clade, frogmouths, and owlet-nightjars—form successive sister 
groups to the diurnal swift and hummingbird clade. 

The Columbaves is a novel clade that consists of two monophyletic 
groups recently identified by Jarvis et al. (Fig. 1, purple). A clade 
consisting of turacos, bustards, and cuckoos (Otidimorphae) is sister 
to a clade consisting of pigeons as the sister group to sandgrouse and 
mesites (Columbimorphae). The third neoavian clade consists of a well 
recognized monophyletic group of core gruiform birds (Gruiformes; 
Fig. 1, yellow), with interrelationships that are consistent with previous 
phylogenies’. 

The Aequorlitornithes is a novel, comprehensive clade of waterbirds, 
including all shorebirds, diving birds, and wading birds (Fig. 1, blue). 
Within this group, the flamingos and grebes’*® are the sister group to 
shorebirds, and the sunbittern and tropicbirds’** are the sister group to 
the wading and diving birds (Fig. 1, blue). Other interrelationships 
within these groups are extensively congruent with the results in 
ref. 4 and the work of others (see Supplementary Information). 

The fifth major neoavian clade, which we name Inopinaves, is a very 
diverse landbird clade with the same composition as previously recog- 
nized (Telluraves)'**, but with the enigmatic, neotropical hoatzin 
(Opisthocomus hoazin) as the sister group to all other landbirds (Fig. 1, 
green). The phylogeny of the landbirds shares many points of congruence 
with earlier hypotheses, including the relationships of seriemas, falcons, 
parrots, and perching birds'**, and the interrelationships among oscine 
songbirds™*. However, we find that hawks (Accipitriformes) are the sister 
group to a new clade including the rest of the landbirds, to be called 
Eutelluraves (see Supplementary Information). 

Our divergence time analyses employed 19 phylogenetically and 
geologically well-constrained fossil calibrations (following recently 
proposed best practices*), documenting many deep divergences 
within the avian crown group (Fig. 1, grey nodes; see Supplementary 
Information). Our analysis supports an extremely rapid radiation of 
the avian crown group in the wake of the K-Pg mass extinction event 
(Fig. 1, Supplementary Figs. 6 and 7). Although the post-K-Pg radi- 
ation hypothesis has long been strongly supported by the avian fossil 
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record’®”’, it has so far received little support from molecular diver- 


gence time analyses*”*. The tempo and mode of the extant avian radi- 
ation remains contentious. For example, an alternative calibration 
analysis including the fossil Vegavis did not support significantly 
different dates of divergence outside of the Galloanserae (see Supple- 
mentary Information and Supplementary Figs 10-12). Confident 
determination of the age of crown Aves will have to await discoveries 
of Mesozoic stem neognaths and palaeognaths, and detailed assess- 
ments of the influence of soft maximum bound parameterization on 
the age of the deepest avian divergences. 

Our results indicate that the recent genome phylogeny’ may contain 
some erroneous relationships induced by long branch attraction from 
sparse taxon sampling. Maximum likelihood analysis of our sequence 
data pruned down to a phylogenetically equivalent subsample of 
48 species produces relationships along the neoavian ‘backbone’ 
(Supplementary Fig. 8) that are entirely discordant with the phylogeny 
based on our full data set (Fig. 1). This reduced taxon analysis recovers 
some of the specific features of the recent genome phylogeny by Jarvis 
et al.’ (Supplementary Fig. 8): for example, the placement of the 
pigeons, mesites, and sandgrouse (a subclade of Columbea’) outside 
of the rest of the Neoaves. Differences in tree topology when taxa are 
excluded are to be expected if early internodes in Neoaves are very 
short. Adding taxa that have diverged near nodes of interest has been 
theoretically demonstrated to constrain the possible historical substi- 
tution patterns, and increase the accuracy of phylogenetic inference". 
By increasing our taxon sampling to include all major avian lineages, 
we have minimized the possibility that additional taxon sampling 
alone will alter the relationships in our tree. 

Jarvis et al.' also identified a well supported clade consisting of the 
hoatzin (Opisthocomus) as the sister group to a crane (Grus) and a 
plover (Charadrius) (total evidence nucleotide tree, BS = 0.91, 0.96, 
respectively). However, Grus and Charadrius were the only species 
sampled from two very diverse neoavian orders: Gruiformes, 185 spe- 
cies; and Charadriiformes, 385 species*. Our results indicate that 
Opisthocomus is the most ancient bird lineage (~ 64 million years) 
consisting of only a single, extant species. Thus, the three taxa placed 
in this assemblage by Jarvis et al.’ comprise three of the most ancient, 
and under-sampled lineages within all birds, indicating the strong 
possibility of long branch attraction artefacts. By contrast, these same 
groups are represented by 26 species in our analysis, and they do not 
form an exclusive clade (Fig. 1). 

In addition to providing a new backbone for comprehensive avian 
supertrees and comparative evolutionary analyses**, this new avian 
phylogeny supports many interesting hypotheses about avian evolu- 
tion. This phylogeny upholds the hypothesis that the ancestor of the 
diurnal swifts and hummingbirds evolved from a clade that had been 
predominantly nocturnal for ~10 million years. Although humming- 
birds have acute near-ultraviolet vision”’, the effect of extended ances- 
tral nocturnality on the evolution of the visual system in this group of 
birds is unknown. Our findings also support the emerging pattern that 
landbirds evolved from a raptorial grade’. The sister group relation- 
ships of hawks to the rest of the landbirds, of owls to the diverse 
coraciimorph clade, and of seriemas and falcons to the parrots and 
passerines indicate the persistence of a raptorial ecology among ances- 
tral landbirds. Lastly, the identification of a new, broadly comprehens- 
ive waterbird—-shorebird clade indicates a striking, and previously 
unappreciated, level of evolutionary constraint on the ecological diver- 
sification of birds that will be exciting to investigate in the future. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Locus selection and probe design. Anchor loci described in ref. 12 were extended 
such that each contained approximately 1,350 bp. In some cases neighbouring loci 
were joined to form a single locus. Also, loci that performed poorly in ref. 12 were 
removed from the locus set. This process produced 394 loci (referred to as the 
version 2 vertebrate loci). Genome coordinates corresponding to these regions in 
the Gallus gallus genome (galGal3, UCSC genome browser) were identified and 
sequences corresponding to this region were extracted (coordinates are available 
in the Zenodo archive (http://dx.doi.org/10.5281/zenodo.28343)). In order to 
improve the capture efficiency for passerines, we also obtained homologous 
sequences for Taeniopygia guttata. After aligning the Gallus and Taeniopygia 
sequences using MAFFT”', alignments were trimmed to produce the final probe 
region alignments (alignments available in the Zenodo archive), and probes were 
tiled at approximately 1.5 X tiling density (probe specification will be made avail- 
able upon publication). 

Data collection. Data were collected following the general methods of ref. 12 
through the Center for Anchored Phylogenomics at Florida State University 
(http://www.anchoredphylogeny.com). Briefly, each genomic DNA sample was 
sonicated to a fragment size of ~150-350bp using a Covaris E220 focused- 
ultrasonicator with Covaris microTUBES. Subsequently, library preparation and 
indexing were performed on a Beckman-Coulter Biomek FXp liquid-handling 
robot following a protocol modified from ref. 32. One important modification is 
a size-selection step after blunt-end repair using SPRIselect beads (Beckman- 
Coulter; 0.9 X ratio of bead to sample volume). Indexed samples were then pooled 
at equal quantities (typically 12-16 samples per pool), and enrichments were 
performed on each multi-sample pool using an Agilent Custom SureSelect kit 
(Agilent Technologies), designed as specified above. After enrichment, the 12 
enrichment pools were pooled in groups of three in equal quantities for sequencing 
on four PE150 Illumina HiSeq2000 lanes (three enrichment pools per lane). 
Sequencing was performed in the Translational Science Laboratory in the 
College of Medicine at Florida State University. 

Data processing. Paired-read merging (Merge.java). Typically, between 50% and 
75% of sequenced library fragments had an insert size between 150 bp and 300 bp. 
As 150 bp paired-end sequencing was performed, this means that the majority of 
the paired reads overlap and thus should be merged before assembly. The over- 
lapping reads were identified and merged following the methods of ref. 33. In 
short, for each degree of overlap for each read we computed the probability of 
obtaining the observed number of matches by chance, and selected degree of 
overlap that produced the lowest probability, with a P value less than 107'° 
required to merge reads. When reads are merged, mismatches are reconciled using 
base-specific quality scores, which were combined to form the new quality scores 
for the merged read (see ref. 33 for details). Reads failing to meet the probability 
criterion were kept separate but still used in the assembly. The merging process 
produces three files: one containing merged reads and two containing the 
unmerged reads. 

Assembly (Assembler.java). The reads were assembled into contigs using an 
assembler that makes use of both a divergent reference assembly approach to 
map reads to the probe regions and a de novo assembly approach to extend the 
assembly into the flanks. The reference assembler uses a library of spaced 20-mers 
derived from the conserved sites of the alignments used during probe design. A 
preliminary match was called if at least 17 of 20 matches exist between a spaced 
kmer and the corresponding positions in a read. Reads obtaining a preliminary 
match were then compared to an appropriate reference sequence used for probe 
design to determine the maximum number of matches out of 100 consecutive 
bases (all possible gap-free alignments between the read and the reference ware 
considered). The read was considered mapped to the given locus if at least 
55 matches were found. Once a read is mapped, an approximate alignment posi- 
tion was estimated using the position of the spaced 20-mer, and all 60-mers 
existing in the read were stored in a hash table used by the de novo assembler. 
The de novo assembler identifies exact matches between a read and one of the 60- 
mers found in the hash table. Simultaneously using the two levels of assembly 
described above, the three read files were traversed repeatedly until an entire pass 
through the reads produced no additional mapped reads. 

For each locus, mapped reads were then clustered into clusters using 60-mer 
pairs observed in the reads mapped to that locus. In short, a list of all 60-mers 
found in the mapped reads was compiled, and the 60-mers were clustered if found 
together in at least two reads. The 60-mer clusters were then used to separate the 
reads into clusters for contig estimation. Relative alignment positions of reads 
within each cluster were then refined in order to increase the agreement across 
the reads. Up to one gap was also inserted per read if needed to improve the 
alignment. Note that given sufficient coverage and an absence of contamination, 
each single-copy locus should produce a single assembly cluster. Low coverage 
(leading to a break in the assembly), contamination, and gene duplication, can all 


lead to an increased number of assembly clusters. A whole-genome duplication, 
for example, would increase the number of clusters to two per locus. 

Consensus bases were called from assembly clusters as follows. For each site an 
unambiguous base was called if the bases present were identical or if the poly- 
morphism of that site could be explained as sequencing error, assuming a binomial 
probability model with the probability of error equal to 0.1 and alpha equal to 0.05. 
If the polymorphism could not be explained as sequencing error, the ambiguous 
base was called that corresponded to all of the observed bases at that site (for 
example, ‘R’ was used if ‘A’ and ‘G’ were observed). Called bases were soft-masked 
(made lowercase) for sites with coverage lower than five. A summary of the 
assembly results is presented in a spreadsheet in the electronic data archive (http:// 
dx.doi.org/10.5281/zenodo.28343; Prum_AssemblySummary_Summary.xlsx). 

Contamination filtering (IdentifyGoodSeqsViaReadsMapped.r, GatherALL 
ConSeqsWithOKCoverage.java). In order to filter out possible low-level contami- 
nants, consensus sequences derived from very low coverage assembly clusters 
(<10reads) were removed from further analysis. After filtering, consensus 
sequences were grouped by locus (across individuals) in order to produce sets 
of homologues. 

Orthology (GetPairwiseDistanceMeasures.java, plotMDS5.r). Orthology was 
then determined for each locus as follows. First, a pairwise distance measure 
was computed for pairs of homologues. To compute the pairwise distance between 
two sequences, we computed the percent of 20-mers observed in the two sequences 
that were found in both sequences. Note that the list of 20-mers was constructed 
from consecutive 20-mers as well as spaced 20-mers (every third base), in order to 
allow increased levels of sequence divergence. Using the distance matrix, we 
clustered the sequences using a neighbour-joining algorithm, but allowing at most 
one sequence per species to be in a given cluster. Clusters containing fewer than 
50% of the species were removed from downstream processing. 

Alignment (MAFFT). Sequences in each orthologous set were aligned using 
MAFFT v7.023b”! with “-genafpair” and “-maxiterate 1000” flags. 

Alignment Trimming (TrimAndMaskRawAlignments3). The alignment for 

each locus was then trimmed/masked using the following procedure. First, each 
alignment site was identified as ‘good’ if the most common character observed was 
present in >40% of the sequences. Second, 20 bp regions of each sequence that 
contained <10 good sites were masked. Third, sites with fewer than 12 unmasked 
bases were removed from the alignment. Lastly, entire loci were removed if both 
outgroups or more than 40 taxa were missing. This filter yielded 259 trimmed loci 
containing fewer than 2.5% missing characters overall. 
Model selection and phylogenetic inference. To minimize the overall model 
complexity while accurately accounting for substitution processes, we performed 
a partition-model sensitivity analysis with the development version of 
PartitionFinder v2.0 (ref. 13), sensu'*, and compared a complex partition-model 
(one partition per gene) to a heuristically optimized (relaxed clustering with the 
RAxML option for accelerated model selection) partition-model using BIC. Based 
on a candidate pool of potential partitioning strategies that spanned a single 
partition for the entire data set to a model allowing each locus to represent a 
unique partition, the latter approach suggested that 75 partitions of our data set 
represented the best-fitting partitioning scheme, which reduced the number of 
necessary model parameters by 71%, and hugely decreased computation time. 

We analysed each individual locus in RAxXML v8.0.20 (ref. 18), and then the 
concatenated alignment, using the two partitioning strategies identified above 
with both maximum likelihood and Bayesian based approaches in RAxML 
v8.0.20, and ExaBayes v1.4.2 9 (ref. 34). For each RAxML analysis, we executed 
100 rapid bootstrap inferences and thereafter a thorough ML search using a 
GTR+TI, model of nucleotide substitution for each data set partition. Although 
this may potentially over-parameterize a partition with respect to substitution 
model, the influence of this form of model over-parameterization has been found 
to be negligible in phylogenetic inference”’. For the Bayesian analyses, we ran four 
Metropolis-coupled ExaBayes replicates for 10 million generations, each with 
three heated chains, and sampling every 1,000 generations (default tuning and 
branch swap parameters; branch lengths among partitions were linked). 
Convergence and proper sampling of the posterior distribution of parameter 
values were assessed by checking that the effective sample sizes of all estimated 
parameters and branch lengths were greater than 200 in the Tracer v1.6 software*® 
(most were greater than 1,000), and by using the ‘sdsf and ‘postProcParam’ tools 
included with the ExaBayes package to ensure the average standard deviation of 
split frequencies and potential scale reduction factors across runs were close to 
zero and one, respectively. Finally, to check for convergence in topology and clade 
posterior probabilities, we summarized a greedily refined majority-rule consensus 
tree (default) from 10,000 post burn-in trees using the ExaBayes ‘consense’ tool for 
each run independently and then together. Analyses of the reduced data set refer- 
enced in the main text were conducted using the same partition-model as the full 
data set. 
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To explore variation in gene tree topology and to look for outliers that might 

influence combined analysis, we calculated pairwise Robinson-Foulds®”’ (RF) and 
Matching Splits (MS) tree distances implemented in TreeCmp*. We then visua- 
lized histograms of tree distances and multidimensional scaling plots in R, 
and estimated neighbour-joining ‘trees-of-trees’ in the Phangorn R package 
sensu lato”“°. Using RF and MS distances, outlier loci were identified as those 
that occurred in the top 10% of pairwise distances for >30 comparisons to other 
loci (~10%) in the data set. We also identified putative outlier loci using the 
kdetrees.complete function of the kdetrees R package*’. All three methods iden- 
tified 13 of the same loci as potential outliers; however removal of these loci from 
the analysis had no effect on estimating topology or branch lengths. 
Coalescent species tree analyses. Although fully parametric estimation (for 
example, *BEAST, see ref. 42) of a coalescent species tree with hundreds of genes 
and hundreds of taxa is not currently possible, we estimated species trees using 
three gene-tree summation methods that have been shown to be statistically 
consistent under the multispecies coalescent model‘. First, we used the 
STRAW web server“ to estimate bootstrapped species trees using the STAR” 
and NJ-ST’® algorithms (also available through STRAW). The popular MP- 
EST* method cannot currently work for more than ~50 taxa. STAR takes rooted 
gene trees and uses the average ranks of coalescence times’ to build a distance 
matrix from which a species tree is computed with the neighbour-joining 
method”. By contrast, NJst applies the neighbour-joining method to a distance 
matrix computed from average gene-tree internode distances, and relaxes the 
requirement for input gene trees to be rooted”. 

We also summarized a species tree with the ASTRAL 4.7.6 algorithm. With 
simulated data, ASTRAL has been shown to outperform concatenation or other 
summary methods under certain amounts of incomplete lineage sorting”. For 
very large numbers of taxa and genes, ASTRAL uses a heuristic search to find the 
species tree that agrees with the largest number of quartet trees induced by the set 
of input gene trees. For analysis with ASTRAL, we also attempted to increase the 
resolution of individual gene trees (Supplementary Fig. 2) by generating supergene 
alignments using the weighted statistical binning pipeline of refs 47, 48 with a 
bootstrap score of 0.75 as a bin threshold. 

STAR, NJst (not shown), and the binned ASTRAL (Supplementary Fig. 3) 
analysis produced virtually identical inferences when low support branches 
(<0.75) were collapsed, and differed only with respect to the resolution of a few 
branches. NJst resolved the Passeroidea (Fringilla plus Spizella) as the sister group 
to a paraphyletic sample of Sylvioidea (Calandrella, Pycnonotus, and Sylvia), while 
STAR does not resolve this branch. Comparing STAR/NJst to ASTRAL, we find 
five additional differences: (1) within tinamous, STAR/NJst resolves Crypturellus 
as sister to the rest of the tinamous, whereas ASTRAL resolves Crypturellus as 
sister to Tinamus (similar to ExaBayes/RAXxML); (2) STAR/NJst resolves pigeons 
as sister to a clade containing Mesitornithiformes and Pteroclidiformes, while 
ASTRAL does not resolve these relationships; (3), STAR/NJst fails to resolve 
Oxyruncus and Myiobius as sister genera, while ASTRAL does (similar to 
RAxML/ExaBayes); (4), in STAR/NJst, bee-eaters (Merops) are resolved as the 
sister group to coraciiforms (congruent with ref. 4), while ASTRAL resolves 
bee-eaters as sister to the rollers (Coracias) (similar to RAxML/ExaBayes); 
(5) lastly, in STAR/NJst, buttonquail (Turnix) is resolved as sister to the most 
inclusive clade of Charadriiformes not including Burhinus, Charadrius, 
Haematopus, and Recurvirostra, while in ASTRAL, buttonquail is resolved as sister 
to a clade containing Glareola, Uria, Rynchops, Sterna, and Chroicocephalus (sim- 
ilar to RAxML/ExaBayes). 

Although lower level relationships detected with concatenation are generally 
recapitulated in the species trees, few of the higher level, or interordinal, relation- 
ships are resolved. This lack of resolution of the gene-tree species-tree based 
inferences relative to the inferences based on concatenation are not surprising, 
as it is increasingly recognized that the phylogenetic information content 
required to resolve the gene-tree histories of individual loci becomes scant at 
deep timescales’. Despite our extensive taxon sampling and the slow rate of 
nucleotide substitution that characterizes loci captured using anchored enrich- 
ment’’, no single locus was able to fully resolve a topology, and this lack of 
information will challenge the accuracy of any coalescent-based summary 
approach relative to concatenation”. Finally, all summation methods tested 
here assume a priori that the only source of discordance among gene trees is deep 
coalescence, and violations of this assumption may introduce systematic error in 
phylogeny estimation™. 

Phylogenetic informativeness. Site-specific evolutionary rates, /;.__;, were calcu- 
lated for each locus using the program HyPhy™ in the PhyDesign web interface” 
in conjunction with a guide chronogram generated by a nonparametric rate 
smoothing algorithm” applied to our concatenated RAxML tree. Using these rates 
to predict whether an alignment will yield correct, incorrect, or no resolution of a 
given node, we quantified the probability of phylogenetically informative changes 
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(y)'® contributing to the resolution of the earliest divergences in Neoaves. 
Estimates generated under a three character state model”* reveal that the majority 
of loci have a strong probability of y, and suggest a high potential for most loci and 
partitions containing multiple loci (assigned by PartitionFinder) to correctly 
resolve this internode. The potential for resolution as a consequence of phylogen- 
etic signal is therefore high relative to the potential for saturation and misleading 
inference induced by stochastic changes along the subtending lineages (Supple- 
mentary Fig. 4a). 

To assess the information content of the loci across the entire topology, we 
profiled their phylogenetic informativeness (PI)'°, (Supplementary Fig. 4b). There 
was considerable variation in PI across loci (Supplementary Fig. 4). In all cases, the 
loci with the lowest values of y are categorized by substantially lower (60-90%) 
values of PI, rather than sharp declines in their PI profiles. The absence of a sharp 
decline in the PI profile suggests that a lack of phylogenetic information, rather 
than rapid increases in homoplasious sites, underlie low values of the probability of 
signal y?. 

Because declines in PI can be attributed to increases in homoplasious site 
patterns”, we further assessed the phylogenetic utility of data set partitions by 
quantifying the ratio of PI at the most recent common ancestor of Neoaves to the 
PI at the most recent common ancestor of Aves (Supplementary Fig. 4c). Values of 
this ratio that are less than 1 correspond to a rise in PI towards the root. Values 
close to 1 correspond to fairly uniform PI. Values greater than 1 correspond to a 
decline in PI towards the root. Sixty-six out of 75 partitions demonstrated less than 
a 50% percent decline in PI, and only six partitions demonstrated a decline of PI 
greater than 75% (Supplementary Fig. 4c). As all but a few nodes in this study 
represent divergences younger than the crown of Neoaves, these ratios of PI 
suggest that the predicted impact of homoplasy on our topological inferences 
should be minimal. 

As PI profiles do not directly predict the impact of homoplasious site patterns 

on topological resolution’*”, we evaluated probabilities of —y for focal nodes using 
both the concatenated data set as well as individual loci that span the variance in 
locus lengths. Concordant with expectations from the PI profiles, all quantifica- 
tions strongly support the prediction that homoplasy will have a minimal impact 
on topological resolution for the concatenated data set across a range of tree depths 
and internode distances (iy = 1.0 for all nodes), while individual loci vary in their 
predicted utility (Supplementary Fig. 4d). As the guide tree does not represent a 
true known tree, we additionally quantified y across a range of tree depths and 
internode distances to test if our predictions of utility are in line with general 
trends in the data. Concordant with our results above, the concatenated data set 
is predicted to be of high phylogenetic utility at all timescales (W = 1.0 for all 
nodes), while the utility of individual loci begins to decline for small internodes 
at deep tree depths (Supplementary Fig. 5). 
Estimating a time-calibrated phylogeny. We estimated a time-calibrated tree 
with a node dating approach in BEAST 1.8.1 (ref. 42) that used 19 well justified 
fossil calibrations phylogenetically placed by rigorous, apomorphy-based dia- 
gnoses (see the descriptions of avian calibration fossils in the Supplementary 
Information). We used a starting tree topology based on the ExaBayes inference 
(Fig. 1), and prior node age calibrations that followed a lognormal parametric 
distribution based on occurrences of fossil taxa. To prevent BEAST from exploring 
topology space and only allow estimation of branch lengths, we turned off the 
subtree-slide, Wilson-Balding, and narrow and wide exchange operators**. 
Finally, we applied a birth—-death speciation model with default priors. 

As rates of molecular evolution are significantly variable across certain bird 
lineages®* °°, we applied an uncorrelated relaxed clock (UCLN) to each partition 
of the data set where rates among branches are distributed according to a lognor- 
mal distribution®’. All dating analyses were performed without crocodilian out- 
groups to reduce the potential of extreme substitution rate heterogeneity to bias 
rate and consequent divergence time estimates of the UCLN model*’. 

All calibrations were modelled using soft maximum age bounds to allow for the 
potential of our data to overwhelm our user-specified priors®. Soft maximum 
bounds are the preferred method for assigning upper limits on the age of phylo- 
genetic divergences”. As effective priors necessarily reflect interactions between 
user specified priors, topology, and the branching-model, they may not precisely 
reflect the user-specified priors”. To correct for this potential source of error, we 
carefully examined the effective calibration priors by first running the prepared 
BEAST XML without any nucleotide data (until all ESS values were above 200). 
We then iteratively adjusted our user-defined priors until all of the effective priors 
(as examined in the Tracer software) reflected the intended calibration densities. 
Finally, using the compare.phylo function in the Phyloch R package, we examined 
how the inclusion of molecular data influenced the divergence time estimates 
relative to the effective prior (Supplementary Fig. 9; see below). 

Defining priors. Our initial approach was to set a prior’s offset to the age of its 
associated fossil; the mean was then manually adjusted such that 95% of the 
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calibration density fell more recently than the K-Pg boundary at 65 Ma (million 
years ago) (the standard deviation was fixed at 1 Ma). In general, priors con- 
structed this way generated calibration densities that specified their highest density 
peak (their mode) about 3-5 million years older than the age of the offset. 

We applied a loose gamma prior to the node reflecting the most recent common 
ancestor of crown birds—we used an offset of 60.5 Ma (the age of the oldest known 
definitive, uncontroversial crown bird fossil; the stem penguin Waimanu), and 
adjusted the scale and shape of the prior such that 97.5% of the calibration density 
fell more recently than 86.5 Ma’! (see below and Supplementary Information for 
discussion of the >65 Ma putative crown avian Vegavis). This date (86.5 Ma) 
reflects the upper bound age estimate of the Niobrara Formation—one of many 
richly fossiliferous Mesozoic deposits exhibiting many crownward Mesozoic stem 
birds, without any trace of avian crown group representatives. The Niobrara, in 
particular, has produced hundreds of stem birds and other fragile skeletons, with- 
out yielding a single crown bird fossil, and therefore represents a robust choice for 
a soft upper bound for the root divergence of the avian crown’'”. Previous soft 
maxima employed for this divergence have arbitrarily selected the age of other 
Mesozoic stem avians (that is, Gansus yumenensis, 110 Ma) that are phylogeneti- 
cally stemward of the Niobrara taxa”*. Although the implementation of very 
ancient soft maxima such as the age of Gansus are often done in the name of 
conservatism, the extremely ancient divergence dates yielded by such analyses 
illustrate the misleading influence of assigning soft maxima that are vastly too 
old to be of relevance to the divergence of crown group birds”. However, this 
problem has been eliminated in some more recent analyses”’. 

All of the fossil calibrations employed in our analysis represent neognaths; 
rootward divergences within Aves (for example the divergence between 
Palaeognathae and Neognathae, and Galloanserae and Neoaves) cannot be con- 
fidently calibrated due to a present lack of fossils representing the palaeognath, 
neognath, galloanserine, and neoavian stem groups. As such, the K-Pg soft bound 
was only applied to comparatively apical divergences within neognaths. Although 
the question of whether major neognath divergences occurred during the 
Mesozoic has been the source of controversy’*’*, renewed surveys of Mesozoic 
sediments for definitive crown avians or even possible crown neoavians have been 
unsuccessful (with the possible exception of Vegavis; see Supplementary 
Information), and together with recent divergence dating analyses have cast doubt 
on the presence of neoavian subclades before the K-Pg mass extinction’”*”. 
Further, recent work has demonstrated the tendency of avian divergence estimates 
to greatly exceed uninformative priors, resulting in spuriously ancient divergence 
dating results (for example, refs 28, 75, 76, 80). These results motivated our 
implementation of the 65 Ma soft bound for our neoavian calibrations. 

Contrary to expectation, when we compared the effective prior on the entire tree 
to the final summary derived from the posterior distribution of divergence times 
(Supplementary Fig. 9), we found no overall trend of posterior estimated ages post- 
dating prior calibrations. In fact, the inclusion of our molecular data decreases the 
inferred ages of almost all of the deepest nodes in our tree. A similar result has been 
obtained for mammals by using large amounts of nuclear DNA sequences*'. 
Future work investigating the interplay of the density of genomic sampling and 
the application of various calibration age priors will be indispensible for sensitivity 
analyses to help us further develop a robust timescale of avian evolution. However, 
the pattern of posterior versus prior age estimates observed in our study raises the 
prospect that the new class of data used in this study (that is, semi-conserved 
anchor regions) may exhibit some immunity to longstanding problems associated 
with inferring avian divergence times, such as systematically over-estimating the 
antiquity of extant avian clades. 

Implementing BEAST and summarizing a final calibrated tree. In addition to 
making predictions about the phylogenetic utility of a locus or partition towards 
topological resolution, PI profiles have recently also been used to mitigate the 
influence of substitution saturation on divergence time estimates*’. Given the 
variance in PI profile shapes for captured loci and their subsequent partition 
assignments (Supplementary Fig. 4c), and observations that alignments and sub- 
sets of data alignments characterized by high levels of homoplasy can mislead 
branch length estimation®**, we limited our divergence time estimates to 36 
partitions that did not exhibit a decline in informativeness towards the root of 
the tree. We ran BEAST on each partition separately until parameter ESS values 
were greater than 200 (most were greater than 1,000) to ensure adequate posterior 
sampling of each parameter value. After concatenating 10,000 randomly sampled 
post burn-in trees from each of these completed analyses, we summarized a final 
MCC tree with median node heights in TreeAnnotator v1.8.1 (ref. 42). 
Supplementary Fig. 6 shows the full, calibrated Bayesian tree (Fig. 1) with 95% 
HPD confidence intervals on the node ages, and Supplementary Fig. 7 shows the 
distribution of estimated branching times, ranked by median age (using clade 
numbers from Fig. 1). All computations were carried out on 64-core PowerEdge 
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Biodiversity increases the resistance of ecosystem 
productivity to climate extremes 
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It remains unclear whether biodiversity buffers ecosystems against 
climate extremes, which are becoming increasingly frequent world- 
wide’. Early results suggested that the ecosystem productivity of 
diverse grassland plant communities was more resistant, changing 
less during drought, and more resilient, recovering more quickly 
after drought, than that of depauperate communities’. However, 
subsequent experimental tests produced mixed results**. Here we 
use data from 46 experiments that manipulated grassland plant 
diversity to test whether biodiversity provides resistance during 
and resilience after climate events. We show that biodiversity 
increased ecosystem resistance for a broad range of climate events, 
including wet or dry, moderate or extreme, and brief or prolonged 
events. Across all studies and climate events, the productivity of 
low-diversity communities with one or two species changed by 
approximately 50% during climate events, whereas that of high- 
diversity communities with 16-32 species was more resistant, 
changing by only approximately 25%. By a year after each climate 
event, ecosystem productivity had often fully recovered, or over- 
shot, normal levels of productivity in both high- and low-diversity 
communities, leading to no detectable dependence of ecosystem 
resilience on biodiversity. Our results suggest that biodiversity 
mainly stabilizes ecosystem productivity, and productivity- 
dependent ecosystem services, by increasing resistance to climate 
events. Anthropogenic environmental changes that drive biodiver- 
sity loss thus seem likely to decrease ecosystem stability'*, and 
restoration of biodiversity to increase it, mainly by changing the 
resistance of ecosystem productivity to climate events. 
Biodiversity stabilizes ecosystem productivity over time*’*”*; how- 
ever, it remains unclear whether it does so by providing resistance 
during climate events, resilience (sensu rapid recovery”) after climate 
events, or both (Extended Data Fig. 1). Two decades ago, a seminal 
study reported that the ecosystem productivity of diverse grassland 


plant communities was more resistant and more resilient to a major 
drought than that of depauperate communities’. However, this study 
had not experimentally manipulated biodiversity, which confounded 
variation in biodiversity with variation in species composition and 
resource availability”. Hundreds of biodiversity experiments were 
subsequently conducted”*”’, but few of these studies revisited this 
important question, and those that did so found mixed results*». 
Further analysis of the original data also produced mixed results”. 
Thus, it remains unclear whether biodiversity buffers ecosystems 
against climate extremes, which are becoming increasing frequent 
worldwide’. 

We combined data from 46 experiments that manipulated grassland 
plant diversity and measured productivity across Europe and North 
America (Extended Data Fig. 2 and Extended Data Table 1). We 
classified each year of each experiment as extremely dry, moderately 
dry, normal, moderately wet, or extremely wet (Extended Data Figs 2 
and 3) (Methods). To do this in a globally consistent manner, we used a 
drought index that quantified month-by-month variations in water 
balance over the past century on 0.5 degree x 0.5 degree grids 
globally, based on measurements at more than 4,000 weather stations 
worldwide” (Extended Data Figs 2 and 3). We defined climate 
extremes (extremely dry or extremely wet) as events occurring less 
frequently than once per decade, based on the historic climate at each 
site over the past century (Methods). Moderately dry and wet events 
were defined as those that had historically occurred between once in 
4 years and once per decade. Normal years included the interquartile 
range of observed water balances. Given these cutoffs, there were 18 
extremely dry, 32 moderately dry, 87 normal, 37 moderately wet, and 
21 extremely wet experiment years that occurred during these bio- 
diversity experiments (Extended Data Figs 2 and 3). Unsurprisingly, 
productivity tended to be lower than normal during dry events and 
higher than normal during wet events (Extended Data Fig. 4), although 
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Table 1| Fixed effect tests and variance component estimates 
(standard error) for linear mixed-effects models 


Resistance Resilience 
Fixed effects 
Biodiversity Fi 278 = 20.68*** Fyg5 = 0.67 
Direction Fi gi7 = 0.53 Fise9 = 0.15 
Intensity Fi g56 = 1.40 Fis77 = 2.36 
Biodiversity x intensity Fy 92.3 = 3.02* 
Biodiversity x direction Fia6.1 = 6.52** 
Variance components 
Study 0.37 (0.15) 1.4 x 10°-°(3.5 x 10-8) 
Study x biodiversity 0.041 (0.022) 0.0067 (0.0096) 
Study x year 0.32 (0.074) 0.68 (0.15) 
Study x biodiversity xX year 0.033 (0.011) 0.018 (0.012) 
Plot 0.25 (0.038) 9.6 x 10°7(2.3 x 10-8) 
Plot x year 2.1 (0.051) 4.1 (0.099) 
Temporal autocorrelation 
PARL 0.12 (0.025) —0.41 (0.020) 


*P<0.1; **P<0.05; ***P<0,001. Direction: 0, dry; 1, wet. Intensity: 0, moderate; 1, extreme. 
Biodiversity: logo(number of species). Study = factor. Year = factor. Plot is defined within studies. Both 
response variables were log-transformed. Non-significant (P > 0.1) interactions were excluded from 
the model. Kenward-Roger approximation is given for denominator degrees of freedom. 


there were exceptions to this general trend (Extended Data Fig. 5). 
Productivity overshot normal levels when recovering during the year 
after extreme (but not moderate) dry and wet events (Extended Data 
Fig. 4), which is consistent with damped oscillations, rather than mono- 
tonic recovery, of productivity after climate extremes (Extended Data 
Fig. 1). Consistent with previous studies”’*”’, biodiversity increased 
ecosystem stability (Fig. 1a; F),37.4 = 28.74, P< 0.001). 

We quantified resistance and resilience, using proportional changes 
in productivity from one year to the next, within each experimental 
unit (plot) for each observed climate event (Methods). Linear mixed- 
effects models were used to test whether resistance and resilience 
depend on biodiversity, and how these biodiversity effects depend 
on climate event properties, such as the direction (wet or dry), intensity 
(moderate or extreme), or duration (3-24 months) of climate events, 
while accounting for repeated measurements (Methods). 

Biodiversity increased the resistance of ecosystem productivity to a 
broad range of climate events (biodiversity main effect in Table 1 and 
Fig. 1b). That is, more diverse communities exhibited smaller propor- 
tional changes in productivity during climate events. On average, 
across all studies and climate events, the productivity of low-diversity 
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communities with one or two species changed by approximately 50% 
(Q ~ 2; Fig. 1b), whereas that of high-diversity communities with 16- 
32 species changed by approximately 25% (Q ~ 4; Fig. 1b), during 
climate events. Biodiversity increased resistance irrespective of the 
direction (wet or dry) or intensity (moderate or extreme) of climate 
events (all interactions were non-significant, P > 0.05; Table 1). There 
was, however, one marginally significant interaction: biodiversity may 
have increased resistance more during moderate climate events than 
during extreme ones (biodiversity X intensity interaction in Table 1 
and Extended Data Fig. 6). There was substantial variability in the 
effect of biodiversity on resistance among studies and among years 
within studies (see variance components in Table 1, Fig. 1b and 
Extended Data Fig. 7); however, biodiversity increased resistance simi- 
larly in long-term studies that were conducted for at least 9 years, and 
in short-term studies (Methods). 

Examination of the dynamics of recovery shows that, at both low 
and high diversity, productivity had often returned to, or overshot, its 
normal level during the year after a climate event (Extended Data 
Fig. 4). Given this rapidity of recovery both for low- and for high- 
diversity communities, biodiversity may not have a major impact 
on the recovery of ecosystem productivity after climate events, at 
least over the timescales and climate-event intensities considered. 
Indeed, we were unable to detect strong and consistent effects of 
biodiversity on our measure of ecosystem resilience (Table 1 and 
Fig. lc). Biodiversity decreased resilience after wet events, and 
increased, although non-significantly (see confidence intervals 
for 12-month events shown in Fig. 2), resilience after dry events 
(biodiversity < direction interaction in Table 1 and Fig. 1c). That is, 
less diverse communities recovered closer to normal levels of produc- 
tivity during the year after wet events. On average, across all studies, 
climate events, and levels of biodiversity, productivity moved approxi- 
mately 10% closer to normal levels (4 ~ 1.1; Fig. 1c) during the 
year after climate events; however, this was often due to greatly over- 
shooting, rather than failing to reach, normal levels of productivity 
(Extended Data Fig. 4). The effect of biodiversity on resilience did 
not vary substantially among studies or among years within studies 
(see relatively small point estimates with large standard errors for 
biodiversity variance components in Table 1 and Extended Data Fig. 8). 

Next, we tested how our results depended on the duration over 
which climate events were defined. To do so, we considered multiple 
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Figure 1 | Biodiversity effects on ecosystem stability, and its resistance 
and resilience components. Biodiversity consistently increases ecosystem 
stability (a) and resistance (b), but not resilience (c). Lines are mixed-effects 
model fits for each study (a), or each climate event within each study (b, c) (thin 
lines), or across climate events and studies (thick lines with bands indicating 


95% confidence intervals). Thick lines and bands in c indicate trends averaged 
across both moderate and extreme events for either dry (dashed red lines) 

or wet (solid blue lines) events. Stability measures are unitless. Axes are 
logarithmic. See Table 1 for test statistics and Extended Data Table 1 for 
sample sizes. 
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Figure 2 | Effects of biodiversity on stability measures with climate events 
defined over shorter or longer durations. Biodiversity consistently increases 
resistance; however, biodiversity effects on resilience depend on the direction 
(wet or dry) and duration of climate events. Values shown are parameter 
estimates and 95% confidence intervals for biodiversity effects from mixed- 
effects models, with the 12-month values corresponding to the results shown in 
Table 1 and Fig. 1. Values in the upper panel are averaged across both intensities 
and both directions. For clarity, values in the lower panel are slightly offset 
on the x axis. See Extended Data Table 1 for sample sizes. 


versions of the drought index, which aggregated water balances over 
different timescales, ranging from seasonal (3 months) to multi-year 
(24 months) events*® (Methods). We found that biodiversity consis- 
tently increased the resistance of ecosystem productivity during cli- 
mate events, irrespective of the duration (3-24 months) of the climate 
event (Fig. 2). Biodiversity had no significant effect on the resilience of 
ecosystem productivity after brief, intra-annual wet or dry climate 
events (Fig. 2). Biodiversity decreased resilience only after prolonged, 
wet climate events that lasted 1 year or more (Fig. 2). The magnitudes 
of biodiversity effects on resistance were substantially larger than those 
on resilience for all but the longest durations (Fig. 2). 

It is difficult, or perhaps impossible, to fully disentangle the resist- 
ance and resilience components of empirical time series, especially 
when there are frequent perturbations. For example, resilience to the 
first of two consecutive climate events could bias estimates of resist- 
ance to the second event. Similarly, resistance to the second of two 
consecutive climate events could bias estimates of resilience to the first 
event. To explore how this might have affected our results, we tested 
whether biodiversity effects on resistance differed between climate 
events that were preceded either by normal or by other climate event 
years, and whether biodiversity effects on resilience differed between 
climate events that were succeeded either by normal or by climate event 
years (Methods). We found that biodiversity increased resistance, 
especially during climate events that were preceded by climate event 
years (biodiversity < consecutive interaction: Fj 64.3 = 7.21, P< 0.01) 
(Extended Data Fig. 9), and that biodiversity did not significantly 
impact resilience, regardless of whether a climate event was succeeded 
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Figure 3 | Biodiversity effects on productivity during climate events or 
normal years. Lines are mixed-effects model fits for each year within each 
study (thin lines) or across all years and studies (thick lines with bands 
indicating 95% confidence intervals). See Extended Data Fig. 5 for results 
within studies. There was a significant effect of biodiversity on productivity 
(F,,30.6 = 202.4, P< 0.001), a significant effect of event (F4,139,.5 = 6.86, 
P<0.001), and a significant biodiversity < event interaction (Fy1243 = 3.23, 
P=0.015). Axes are logarithmic. See Extended Data Table 1 for 

sample sizes. 


by a normal year or another climate event (biodiversity X consecutive 
interaction: F\ 39.6 = 2.42, P = 0.13). We also tested whether biodiver- 
sity significantly influenced resilience when considering only climate 
events that were succeeded by multiple normal years in long-term 
studies that were conducted for at least 9 years, and with resilience 
quantified 2, rather than 1, years after climate events (Methods). We 
again found no detectable effect of biodiversity on resilience 
(Fi,10.6 = 0.20, P = 0.66). Thus, biodiversity did not influence resili- 
ence after 1 or 2 years of unperturbed recovery. 

Our results suggest that greater biodiversity generally provides 
greater resistance. We focused on dimensionless, proportional 
measures of resistance and resilience to allow comparisons of com- 
munities with different levels of productivity. However, absolute mea- 
sures of resistance and resilience might be of interest for some 
applications within particular communities, and do not necessarily 
depend on biodiversity in the same manner (Fig. 3 and Extended 
Data Figs 4 and 5). Given that biodiversity increases productivity, 
more productivity could be lost during dry events, and gained back 
after dry events, in diverse than in depauperate communities*””. In this 
case, it is also important to note that our analyses show that biodiver- 
sity increased productivity not only during normal years, but also 
during climate events (Fig. 3). 

Our results suggest that biodiversity stabilizes ecosystem productiv- 
ity, and probably productivity-dependent ecosystem services, during 
climate events that are moderate or extreme. Anthropogenic envir- 
onmental changes that drive biodiversity loss will probably decrease 
ecosystem stability’* by decreasing the resistance of ecosystem produc- 
tivity to climate events. Restoring biodiversity will probably increase 
ecosystem resistance to climate extremes, which are forecast to become 
increasingly frequent as the global climate continues to change. 
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METHODS 


Defining ecosystem stability measures. We define measures of resistance and 
resilience that are (1) dimensionless, and thus directly comparable between studies 
and communities with different levels of productivity; (2) symmetric, and thus 
directly comparable between positive and negative perturbations, such as wet and 
dry climate events; (3) applicable to dynamic systems that exhibit either mono- 
tonic recovery or damped oscillations after a perturbation (Extended Data Fig. 1). 
We define resistance as 


Ya 
= —_. 1 
Ya] (1) 
and resilience as 
Ye — Yn 
A= |—_~ (2) 
Yeu Le Yn 


where Y,,, Y. and Y.+1 are respectively the expected ecosystem productivity 
during normal years (mean across all non-climate event years), during a climate 
event, and during the year after a climate event. Resistance indicates the proximity 
of productivity to normal levels during a climate event. For example, if produc- 
tivity is reduced during a drought to half its normal level, then Q = 2 (Extended 
Data Fig. 1). Resilience indicates the rate of return towards normal productivity 
levels after a climate event. If a climate event lowers productivity, greater biomass 
growth rates during recovery lead to greater resilience up until they are sufficiently 
rapid to lead to full recovery of normal levels of productivity during the subsequent 
year. Any biomass growth rates greater than this lead to progressively lower 
resilience because productivity overshoots its normal level. Thus, consistent with 
stability measures used in theoretical biodiversity-stability studies, this measure of 
resilience has a low value, indicating instability, when the deviation of the system 
from normal productivity levels exponentially decays at a slow rate, either via 
monotonic recovery or damped oscillations (Extended Data Fig. 1). For example, 
if during the year after a climate event productivity recovers either from 50 to 75% 
or from 50 to 125% of normal productivity levels, then productivity will have 
returned halfway from perturbed to normal levels, and 4 = 2 (Extended Data 
Fig. 1). The same is true for recovery in the opposite direction after a positive 
deviation: that is, recovery from 150 to 125% or from 150 to 75% of normal 
productivity levels would also give 4 =2 (Extended Data Fig. 1). The points 
shown in Extended Data Fig. 1 are given by y= = 100, y= 1 = 100 — 100/Q, 
¥¢=11 =100 + 100/Q, and, for all other f, y, = 100 — (100 — y,_ ,)/A for mono- 
tonic recovery or y; = 100 + (100 — y,— ;)/A for damped oscillations, where y is 
productivity. We use a common measure of ecosystem stability, quantified as the 
ratio of the mean to the standard deviation of productivity across years (y/o). This 
measure of ecosystem stability is dimensionless, and thus directly comparable 
between studies and communities with different levels of productivity. 
Identifying wet and dry climate events. Drought occurs when water availability 
remains below normal levels over some period of time”’. Identifying and quan- 
tifying droughts requires consideration of water inputs (precipitation) and water 
losses (potential evapotranspiration). Furthermore, doing so in a globally consist- 
ent manner requires standardization of spatially explicit historical trends for water 
balances, to ensure that ‘normal’ and ‘extreme’ conditions are consistently defined 
across sites. Finally, given that ecosystems need not similarly respond to brief or 
prolonged droughts, it is often useful to consider water balances aggregated over a 
range of short to long timescales. 

We used the standardized precipitation-evapotranspiration index (SPEI) to 
consistently identify and quantify wet and dry climate events across field experi- 
ments over durations ranging from 3 to 24months. SPEI is a standard normal 
variable for water balances aggregated over a given number of months at a par- 
ticular location. SPEI values are based on month-by-month variations in climate 
over the past century (January 1901 to December 2011), based on monthly 
means of measurements made at more than 4,000 weather stations worldwide, 
and provided on 0.5 degree X 0.5 degree grids globally. For example, a value of 
SPEI-12 = —1.28 for August 2005 at a particular location would correspond to a 
level of annual (as indicated by the value of 12) drought (as indicated by the 
negative value) that has historically occurred (between 1901 and 2011) once per 
decade at that location during the months of September to August (Extended Data 
Figs 2 and 3). Similarly, SPEI-3 = 0.67 for August 2005 at a particular location 
would correspond to a level of seasonal wetness that has historically occurred once 
every 4 years at that particular site during the months of June to August (Extended 
Data Figs 2 and 3). 

We extracted SPEI values from SPEIbase”’ raster files for each peak biomass 
harvest at each study site (Extended Data Figs 2 and 3). First, we considered annual 
water balances: SPEI-12. Previous results suggest that primary productivity 
responds to approximately annual water balances in temperate grasslands”. We 


classified experiment years as extremely dry, moderately dry, normal, moderately 
wet, and extremely wet (Extended Data Figs 2 and 3). Extreme events (extremely 
dry or extremely wet) were defined as those that historically occurred less fre- 
quently than once per decade. Moderate events were defined as those that histor- 
ically occurred between once in 4 years and once per decade. Normal years were 
defined as those within the interquartile range of historical water balances. Given 
these cutoffs, there were 18 extremely dry, 32 moderately dry, 87 normal, 37 
moderately wet, and 21 extremely wet experiment years that occurred during these 
biodiversity experiments (Extended Data Figs 2 and 3). Thus, 20% of the experi- 
ment years (18 + 21 = 39 out of 195) were identified as extreme events, which 
corresponds to extremely dry events that occur less than once per decade (10% of 
observations) plus extremely wet events that occur less than once per decade (10% 
of observations). Note that there is an unavoidable shifting baseline for compar- 
isons when defining extreme climate events. If we had defined climate extremes 
based only on data from the early (or late) 1900s, then we would probably have 
identified more (or fewer) extreme climate events. 

Next, we considered how the effects of biodiversity on resistance and resilience 

depended on the duration over which water balances were aggregated. Specifically, 
we re-classified each experiment year as extremely dry, moderately dry, normal, 
moderately wet, and extremely wet years based on other versions of SPEI that 
aggregate water balances over shorter (SPEI-3, SPEI-6, SPEI-9) or longer (SPEI- 
15, SPEI-18, SPEI-21, SPEI-24) periods of time preceding peak biomass harvests, 
and then re-fitted mixed-effects models. 
Statistical analyses. We used linear mixed-effects models to test whether resist- 
ance and resilience depend on biodiversity, and how these biodiversity effects 
depend on climate event properties, such as the direction (wet or dry), intensity 
(moderate or extreme), or duration (3-24 months) of climate events, while 
accounting for repeated measurements. Models were first fitted for annual 
(12-month) climate events (Table 1 and Fig. 1), and then subsequently fitted for 
shorter or longer durations (Fig. 2). Fixed effects were included for biodiversity, 
quantified as the log,(treatment species richness); direction, quantified as a binary 
variable (0, dry; 1, wet); and intensity, quantified as a binary variable (0, moderate; 
1, extreme). All interactions were initially included, and non-significant interac- 
tions (P> 0.1) were subsequently excluded. Random effects were included for 
a study factor; a study X biodiversity interaction; a study X year interaction; a 
study X biodiversity X year interaction; and a plot (within-study) term. The error 
structure accounted for repeated measurements within experimental units (plots) 
across years. A first-order autoregressive covariance structure provided a better fit 
than a compound symmetry (split-plot-in-time) covariance structure, according 
to the Akaike information criterion. For all models, the response variable was log,- 
transformed to meet model assumptions. 

Models were fitted with the asreml function in the asreml package in R, 
and results were extracted with the test.asreml function in the pascal package 
(https://github.com/pascal-niklaus/pascal) in R. After model simplification, as 
described above on the basis of significance of fixed effects and Akaike information 
criterion comparisons of random effect and covariance structures, fixed effects 
were specified as ~biodiversity + direction + intensity + interaction (where 
interaction = biodiversity:intensity for resistance, and interaction = biodiversity: 
direction for resilience), random effects as ~study/(biodiversity*year) + plot, and 
the error structure as rcov = ~id(plot):arl(year). These mixed-effects models 
were fitted for annual resistance and resilience (Fig. 1 and Extended Data Figs 7 
and 8), and for all eight durations of resistance and resilience (Fig. 2). The model 
for productivity only differed in the specification of fixed effects, with a factor for 
climate event (levels of “extreme dry’, ‘moderate dry’, ‘normal’, ‘moderate wet’, and 
“extreme wet’) instead of the direction and intensity terms (Fig. 3 and Extended 
Data Figs 4 and 5). The biodiversity X event interaction was significant and 
retained in the productivity model (Fig. 3). 

Models were fitted for resistance for all studies for which there were observa- 
tions of productivity during both normal and climate event years (Extended Data 
Figs 3 and 7). Models for resilience were fitted for all studies for which there were 
observations during normal, climate event, and post-climate event years, except 
where the only normal year was also the only post-event year because in this case 
Yn = Ye+1 and resilience is undefined (Extended Data Figs 3 and 8). 

Species richness treatments were randomly assigned to experimental units 
(plots). Sample sizes were chosen within individual experiments (Extended Data 
Table 1) to ensure adequate power to detect an effect of richness on productivity. 
Testing whether biodiversity effects differed between short- and long-term 
studies. Given that many of these studies were conducted for only a few years, we 
tested whether our results differed between short- and long-term studies. We did 
so by adding a two-way biodiversity X study duration interaction, and a study 
duration main effect, to the models shown in Table 1, where study duration was a 
binary variable with a value of one for the six studies conducted for at least 9 years 
(Extended Data Table 1), and a value of zero for all other studies. We found 
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similar results between short- and long-term studies, as indicated by non- 
significant interactions between biodiversity and study duration for both resist- 
ance (F1,16,5 = 0.02, P = 0.90) and resilience (F;,23,7 = 0.66, P = 0.42). 

Testing whether biodiversity effects differed between categorical versus con- 
tinuous measures of climate event intensity. We used a categorical specification 
of climate intensity (moderate or extreme) throughout because there were often 
complex nonlinear relationships between biomass production and SPEI within sites 
(Extended Data Fig. 5). However, our categorical specification incurs some informa- 
tion loss, so we also tested whether results were similar when the models shown in 
Table 1 were fitted using the absolute value of the SPEI-12 index in place of the 
binary intensity variable. We found similar results when we considered this con- 
tinuous measure of climate event intensity. That is, biodiversity increased resistance 
(F,,28,.0 = 20.38, P< 0.001) and did not affect resilience (F),.5 = 0.66, P = 0.44). 
Disentangling resistance and resilience. It is difficult, or perhaps impossible, to 
fully disentangle the resistance and resilience components of empirical time series, 
especially when there are frequent perturbations. For example, resilience to the 
first of two consecutive climate events could bias estimates of resistance to the 
second event, and resistance to the second of two consecutive climate events could 
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bias estimates of resilience to the first event. To explore how this affected our 
results, we added a two-way biodiversity X consecutive interaction to the models 
shown in Table 1, anda main effect of consecutive, where consecutive was a binary 
variable with a value of 1 indicating non-consecutive climate events (that is, 
normal year before event for resistance, normal year after event for resilience), 
and 0 otherwise. We also tested whether biodiversity significantly influenced 
resilience when considering only climate events that were succeeded by multiple 
normal years in long-term studies that were conducted for at least 9 years, and with 
resilience quantified 2, rather than 1, years after climate events. To do so, we re- 
fitted the model shown in Table 1, but with resilience quantified using Y, , . rather 
than Y, +, in equation (2). 

Robustness of results to monoculture exclusion. Given that monocultures are 
rare in nature, we tested whether our results depended on inclusion of monocul- 
ture plots. We found similar results when we excluded monocultures. That is, 
biodiversity increased resistance and did not significantly affect resilience when we 
refitted the models shown in Table 1 after excluding monocultures (biodiversity 
effect on resistance: F\,29.2 = 7.25, P= 0.014; biodiversity effect on resilience: 
Fy44= 0.21, P= 0.665). 
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Extended Data Figure 1 | Contrasting ecosystem productivity responses to monotonically (black dashed lines and open triangles) or via damped 


climate events for low or high levels of resistance (Q) and resilience (4). oscillations (solid grey lines and filled circles). Ecosystem stability (u/c) 
In these stylized examples, productivity is decreased by a dry climate event depends on both resistance and resilience. See Methods for definitions of 
during year one, is increased by a wet climate event during year 11, and is resistance and resilience. 


otherwise recovering back towards normal productivity levels either 
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Extended Data Figure 2 | Map of study site locations (bottom) and 
frequency of climate events (top). Bottom: locations for all 46 studies (yellow 
triangles) and an example of spatial variation in water balance, where SPEI-12 
was classified as in the bottom panel. August 2005 was chosen for this example 
because many experiments were underway and harvested during this particular 
month of this particular year (Extended Data Table 1). The spatial patterns 
of wet and dry climate events shown on this map would differ at other times 


(that is, during a different month or year) and for climate events defined over 
other durations (that is, based on water balances aggregated over more or 
fewer than the preceding 12 months). There were multiple experiments at some 
sites (Extended Data Table 1), thus some symbols completely overlap on this 
map. Top: cutoffs for bins correspond to events occurring every 1 in 4 years 
(£0.67) or every 1 in 10 years (+1.28). 
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Extended Data Figure 3 | Classification of extreme dry, moderate dry, 
normal, moderate wet, and extreme wet years for each year of the 46 
experiments. The 12-month version of the SPEI is shown, where positive 
values indicate wetter than normal water balances (precipitation minus 
potential evapotranspiration) during the 12-month time interval preceding and 
including the month of peak biomass harvest. For example, if peak biomass 
was harvested in September, then SPEI-12 accounts for the water balance from 
the previous October to September. Drought index values are based on month- 
by-month variations in climate over the past century (January 1901 to 


Year 


December 2011), based on monthly means of measurements made at more 
than 4,000 weather stations worldwide, and provided on 0.5 degree X 0.5 
degree grids globally. Dashed lines show cutoffs for 1 in 4 (£0.67) or 1 in 10 
(+1.28) year events. Seven experiments that included only normal years 
(Agrodiversity Germany a, Agrodiversity Ireland a, Czech Republic) or that did 
not include any normal years (Agrodiversity Poland a, Agrodiversity Spain a, 
Iowa BioGEN, North Dakota a) were excluded from subsequent analyses 
because it was not possible to compare perturbed with normal productivity 
levels for these studies. 
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Extended Data Figure 4 | Productivity during and after both climate events _ plant productivity during the drought. This might be especially true for low- 


and normal years for monocultures and mixtures of 16 species. Values diversity communities, which have the lowest productivity during drought, 
shown are predicted means and 95% confidence intervals from the mixed- possibly explaining why biodiversity increases resilience after extremely dry 
effects model. Productivity tends to be decreased during dry events and years (Fig. 1c). Similarly, relatively low productivity after extremely wet 
increased during wet events. This trend is reversed during the year after climate years might be due to decreased nutrient availability and/or increased 

events. This pattern of overshooting normal levels of productivity during abundance of enemies as a result of increased plant productivity during the wet 
recovery 1 year after climate events is consistent with damped oscillations, event. This might be especially true for high-diversity communities, which 
rather than monotonic recovery (Extended Data Fig. 1). Relatively high have the highest productivity during wet years, possibly explaining why 
productivity after extreme droughts could be due to increased nutrient biodiversity decreases resilience after extremely wet years (Fig. 1c). Dashed 
availability and/or decreased abundance of herbivores as a result of reduced horizontal lines show normal productivity levels. 
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Extended Data Figure 5 | Biodiversity-productivity relationships for each year of each study, including normal years and climate events. Points are plot- 
level values and lines are mixed-model fits (Fig. 3). 
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Extended Data Figure 6 | A marginally significant interaction between 
biodiversity and intensity (moderate or extreme). Table 1 indicates that 
productivity was marginally more resistant to moderate than to extreme 
climate events, especially at high biodiversity. All other interactions were 
non-significant (P > 0.10). Axes are logarithmic. 
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Extended Data Figure 7 | Biodiversity effects on the resistance of 
productivity to climate extremes. Shown for each study for which there were 
observations of productivity during both normal (Y,) and climate event (Y,) 


years (Extended Data Fig. 3). Points are plot-level values and lines are mixed- 
model fits (Fig. 1b). Axes are logarithmic. 
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Extended Data Figure 8 | Biodiversity effects on the resilience of 
productivity to climate extremes. Shown for each study for which there were 
observations during normal (Y,,), climate event (Y,), and post-climate 

event (Y. +1) years. Quantifying resilience requires more information (that is, 
Y. +1) than quantifying resistance, thus we were unable to quantify resilience 
for eight of the studies shown in Extended Data Fig. 7. Specifically, we were 


unable to quantify resilience for studies where the only climate event occurred 
during the last year of the study (Extended Data Fig. 3) because in this case 
Y, +, is unknown, and for studies where the only normal year was also the only 
post-event year (Extended Data Fig. 3) because in this case Y, = Y. +, and 
resilience is undefined. Points are plot-level values and lines are mixed-model 
fits (Fig. 1c). Axes are logarithmic. 
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Extended Data Figure 9 | Biodiversity effects on the resistance of 
productivity to climate events that were preceded either by a climate event 
(green lines) or by a normal year (black lines). The significant interaction 
shown here indicates that biodiversity increased resistance more during climate 
events preceded by years with climate events than during climate events 
preceded by normal years (F),¢64,3 = 7.21, P< 0.01). Axes are logarithmic. The 
sequence of climate events at each site is shown in Extended Data Fig. 3. 
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Extended Data Table 1 | Study details 


Month 
of peak Levels of 

# of biomass’ #of species 
Study Years years _ harvest plots richness 
Agrodiversity Belgium 2003-2005 3 11 30 1,4 
Agrodiversity Canada 2005-2007. 3 8 30 1,4 
Agrodiversity France 2004-2006 3 10 30 1,4 
Agrodiversity Germany a 2005-2006 2 10 30 1,4 
Agrodiversity Iceland a 2003-2005 3 8 30 1,4 
Agrodiversity Iceland b 2004-2006 3 8 30 1,4 
Agrodiversity Ireland a 2004-2006 3 11 29 1,4 
Agrodiversity Italy 2003-2005 3 12 30 1,4 
Agrodiversity Lithuania a 2003-2005 3 10 30 1,4 
Agrodiversity Lithuania b 2004-2006 3 10 30 1,4 
Agrodiversity Lithuania c 2004-2006 3 10 30 1,4 
Agrodiversity Netherlands 2004-2006 3 10 30 1,4 
Agrodiversity Norway a 2004-2006 3 8 30 1,4 
Agrodiversity Norway b 2003-2005 3 9 30 1,4 
Agrodiversity Norway c 2003-2005 2 10 30 1,4 
Agrodiversity Norway d 2004-2006 3 8 30 1,4 
Agrodiversity Poland a 2004-2006 3 10 30 1,4 
Agrodiversity Spain a 2004-2006 3 7 30 1,4 
Agrodiversity Sweden a 2003-2005 3 9 30 1,4 
Agrodiversity Sweden b 2004-2006 3 9 30 1,4 
Agrodiversity Sweden c 2004-2006 3 9 30 1,4 
Agrodiversity Switzerland 2003-2005 3 10 30 1,4 
Agrodiversity Wales a 2003-2006 4 10 30 1,4 
Agrodiversity Wales b 2004-2006 3 11 30 1,4 
BIODEPTH Germany 1996-1998 3 8 60 1,2,4,8,16 
BIODEPTH Greece 1997-1999 3 5 52 1,2,4,8,18 
BIODEPTH Ireland 1996-1998 3 8 70 1,2,3,4,8 
BIODEPTH Portugal 1997-1999 3 5 56 1,2,4,8,14 
BIODEPTH Sheffield UK 1996-1998 3 9 54 1,2,4,8,12 
BIODEPTH Silwood UK 1996-1998 3 9 66 1,2,4,8,11 
BIODEPTH Sweden 1996-1998 3 8 58 1,2,4,8,12 
BIODEPTH Switzerland 1995-1997 3 8 64 1,2,4,8,32 
Cedar Creek BioCON 1998-2011 14 8 74 1,4,9,16 
Cedar Creek Biodiversity 1996-2011 16 8 168 1,2,4,8,16 
Czech Republic 2003-2005 3 6 96 1,3,6,12 
EVENT 2005-2010 6 9 15 2,4 
lowa BioGEN 2007-2009 3 8 64 1,4 
Jena 2003-2011 9 9 82 1,2,4,8,16,60 
North Dakota a 2003-2005 3 8 15 2,8,16 
North Dakota b 2003-2005 3 8 15 2,8,16 
North Dakota c 2003-2005 3 8 15 2,8,16 
Texas Evenness 2001-2010 10 10 75 1,2,4,8 
Texas MEND 2008-2010 3 10 52 1,9 
Virginia 2008-2011 4 8 64 1,2,4,6,10 
Wageningen Biodiversity 2000-2010 11 8 102 1,2,4,8 
Wageningen CLUE 1996-2007 12 8 10 4,15 
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Hedgehog actively maintains adult lung quiescence 
and regulates repair and regeneration 


Tien Peng't, David B. Frank”, Rachel S. Kadzik?, Michael P. Morley'*”, Komal S. Rathi!*°, Tao Wang”, Su Zhou?, 


Lan Cheng”, Min Min Lu? & Edward E. Morrisey’**>° 


Postnatal tissue quiescence is thought to be a default state in the 
absence of a proliferative stimulus such as injury. Although pre- 
vious studies have demonstrated that certain embryonic develop- 
mental programs are reactivated aberrantly in adult organs to 
drive repair and regeneration’”, it is not well understood how 
quiescence is maintained in organs such as the lung, which displays 
a remarkably low level of cellular turnover**. Here we demonstrate 
that quiescence in the adult lung is an actively maintained state and 
is regulated by hedgehog signalling. Epithelial-specific deletion of 
sonic hedgehog (Shh) during postnatal homeostasis in the murine 
lung results in a proliferative expansion of the adjacent lung 
mesenchyme. Hedgehog signalling is initially downregulated 
during the acute phase of epithelial injury as the mesenchyme pro- 
liferates in response, but returns to baseline during injury resolu- 
tion as quiescence is restored. Activation of hedgehog during acute 
epithelial injury attenuates the proliferative expansion of the lung 
mesenchyme, whereas inactivation of hedgehog signalling prevents 
the restoration of quiescence during injury resolution. Finally, we 
show that hedgehog also regulates epithelial quiescence and regen- 
eration in response to injury via a mesenchymal feedback mech- 
anism. These results demonstrate that epithelial-mesenchymal 
interactions coordinated by hedgehog actively maintain postnatal 
tissue homeostasis, and deregulation of hedgehog during injury 
leads to aberrant repair and regeneration in the lung. 

The Hedgehog (Hh) pathway coordinates tissue-tissue interactions 
in multiple organs during embryonic development through paracrine 
activation of smoothened (Smo)-mediated downstream signalling 
events’. We have previously demonstrated that Shh expressed by 
nascent lung endoderm progenitors coordinates cardiopulmonary 
mesoderm progenitor differentiation into various cardiac and lung 
mesenchymal cell lineages*. To determine whether Hh signalling con- 
tinues to be active in the postnatal adult lung, we used the SAhreGtP 
reporter” and our data show that Shh is expressed in the adult lung 
epithelium predominantly in the Scgblal* club epithelial cells in the 
proximal airway (Fig. 1a), with scattered expression in ciliated epithe- 
lium (Extended Data Fig. 1a) and the Sftpc” alveolar type II epithelial 
cells (Fig. 1b). The downstream transcriptional effector and target of 
hedgehog Glil (ref. 10), is expressed predominantly in mesenchymal 
cells adjacent to the proximal airway and pulmonary artery 
(Fig. 1c), with scattered expression in the alveolar interstitium as prev- 
iously reported (Fig. 1d)''. Lineage tracing in the adult lung with 
Gli1?"®"?:R26R™""© animals’? showed that Glil* Hh-responsive 
cells express several mesenchymal markers including Pdgfra, Pdgfrb, 
vimentin, $100A4, and Collal (Fig. le-h, Extended Data Fig. 1b, c). 
Glil* Hh-responsive mesenchymal cells do not contribute markedly 
to the smooth muscle lineage under homeostatic conditions, with the 
exception of rare venous smooth muscle within the proximal 
pulmonary venous myocardium (Extended Data Fig. 1d-i) and 
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Figure 1 | The lung epithelium signals to the adjacent mesenchyme via 
paracrine Hh signalling during normal homeostasis. a, b, The Shh 
ligand is expressed in airway epithelium marked by Scgblal (a), with 
scattered expression in the Sftpc* alveolar epithelium (b). c, d, The Gli1’*7 
reporter is expressed in the mesenchyme adjacent to the airway and 
pulmonary artery (c) with scattered activation in the alveolar interstitium 
(d). e-h, Lineage traced Glil* cells express Pdgfra* (e), Pdgfrb* (f), 
vimentin (g), and Collal (h). i-k, Lineage-traced Glil* cells do not 
expand in the adult lung after chase periods of 2 days (i), 4 weeks (j) and 
12 weeks (k), with negligible expression of the cell cycle marker Ki67 (i-k, 
arrowheads). AW, airway; V, blood vessel, Tm, tamoxifen. Scale bars, 

100 um. Images representative of 3 animals with 5 sections examined 

per animal. 
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Figure 2 | Postnatal activation of Hh signalling is required to maintain 
lung mesenchymal quiescence. a—d, m, Deletion of Shh from airway 
epithelium increases proliferation in the mesenchyme surrounding the 
airway. e-h, n, Deletion of Smo within Glil* cells causes proliferation and 
mesenchymal expansion as noted by increased Ki67 expression. i-l, 0, Deletion 
of Smo within Pdgfrb* mesenchyme shows increased mesenchymal 
proliferation in the adult lung. AW, airway; V, blood vessel; CKO, conditional 


myofibroblasts during fibrotic injury (Extended Data Fig. 1j). Glil~ 
cells do not contribute to cells of the haematopoietic lineage in the lung 
(Extended Data Fig. 1k). Glil * cells remain quiescent up to 12 weeks 
after lineage labelling, with little to no notable expansion or Ki67 
labelling (Fig. 1i-k). 

We deleted Shh using the Scgb1a1* driver, which is active in the 
airway epithelium, to define the importance of Hh signalling in the 
postnatal lung (Extended Data Fig. 2a, b)'*. Examination of 
Scgblal®:Shhl’"°* adult lungs reveals mesenchymal expansion 
and increased mesenchymal cell proliferation surrounding the 
airway epithelium (Fig. 2a-d, m and Extended Data Fig. 2c-h). 
Thus, epithelial-specific loss of Shh in the postnatal lung is sufficient 
to induce cellular proliferation in the adjacent mesenchyme. 

To address the cell-autonomous role of Hh signalling in adult lung 
mesenchyme, we deleted Smo within Glil * Hh-responsive cells in the 
adult lung and followed their proliferative response. Four weeks after 
Smo deletion, lineage traced Glil* mesenchymal cells expanded rela- 
tive to controls and exhibited increased cell proliferation (Fig. 2e—h, n). 
Wealso deleted Smo using the mesenchyme-specific Pdgfrb”* driver", 
and Pdgfrb’:Smo!"":R26R""”"© adult mutants exhibit increased 
cell proliferation and expansion of the Pdgfrb-derived population sur- 
rounding the airways and in the alveolar interstitium (Fig. 2i-1, o and 


LETTER 


Pdgfrb’®:Smo"*/*:R26R™™™G 


m n 
14 , 50 
= 12 . ; Bs - 
2 < 40 . 
= 10 a 35 a 
a g © 30 
2 & 25 
8 6 < 20 
+ °o 
Ps 4 2 15 
ie 8 10 . 
: ° 
3 2) é 2 
0 ie) 
Control Scgbi1a1°® Control = GIiqereERT2 
o Sphftox/tiox Smofloxitiox 
R26RM7™G 
€ 
5 * . 
2, ——— 
s —— 
2 
oO 
i ° 
< 
o 
4 ° 
a _S— 
ire é 
(o) 
fe} 
z 
Control Pdgfrb°'® 
Smofloxitiox 
R26R™TMG 


knockout; blue represents DAPI counterstaining. Scale bars, 100 jum; 

*P < 0.05. Data represent n = 3 animals per group with 5 sections analysed per 
animal. In vitro lung mesenchyme studies represent technical triplicates, 

with BrdU assay representative of three separate experiments. One-sided 
t-test used to determine statistical significance with centre value representing 
the mean and error bar representing s.d. 


Extended Data Fig. 3a-i). Adult Pdgfrb®:Smol?*°":R26R™1"¢ 
mutants older than 6 months exhibit elevated pulmonary arterial pres- 
sures, indicating that loss of Hh signalling at the bronchovascular 
interface causes pulmonary hypertension (Extended Data Fig. 3j-I). 

We then assessed the transcriptome of isolated adult lung mesen- 
chymal cells expressing the activated SmoM2 mutant form of 
smoothened, resulting in increased Glil expression (Extended Data 
Fig. 4a-i)’*. Unbiased gene ontology (GO) analysis showed highest 
enrichment in the subset of genes involved in “mitotic nuclear 
division”, with most of these transcripts downregulated in Hh-acti- 
vated fibroblasts (Extended Data Fig. 4j, Supplementary Tables 1 
and 2), suggesting that Hh activation attenuates cell cycle progression 
in the adult lung mesenchyme. 

Previous studies have demonstrated that Pdgfr signalling promotes 
postnatal mesenchymal proliferation’®”’, and that Pdgfr isoforms are 
expressed in the adult lung mesenchyme (Fig. 1, Extended Data Fig. 4c, 
d, j). Therefore, we assessed the interaction between Hh and Pdgf 
signalling using a gain-of-function mutant of Pdgfrb (Pdgfrb*, here- 
after referred to as Pdgfrb°°")"”. Activation of Pdgfrb within Hh- 
responsive Glil lung mesenchymal cells results in their proliferative 
expansion (Extended Data Fig. 4m-p, s). However, concurrent 
expression of SmoM2 attenuates the Pdgfrb-induced expansion of 
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Figure 3 | Hh signalling modulates the acute mesenchymal Tesponse to 
epithelial injury. a—f, Naphthalene injury downregulates Glil'*““ expression 
(n = 2 animals per group) and g, Shh and Glil expression noted by quantitative 
PCR (n = 3 animals per group). NS, not significant. h-l, Glil* lung cells 
undergo proliferative expansion shortly after naphthalene injury as measured 
by Ki67* expression. m-q, Clonal analysis of Glil* lung cells at single cell 
resolution demonstrates clonal expansion three days after injury. 


Glil* Hh-responsive mesenchyme (Fig. 2q-s). Activation of Hh sig- 
nalling in isolated lung mesenchymal cells in vitro (derived from 
UBCT* 87? R26R°"°”” animals) attenuates the proliferation induced 
by exogenous Pdgf-BB (platelet-derived growth factor-BB) (Extended 
Data Fig. 41). 

Next, we assessed the expression of Hh signalling components dur- 
ing airway epithelial injury with naphthalene’®. Acute naphthalene 
injury caused a reduction in Hh activation as assessed by decreased 
Gli1'“” reporter activity in the mesenchyme surrounding the 
airway, reduced expression of Shh and Glil transcripts, and decreased 
expression of GFP in the ShnrectP reporter (Fig. 3a-g, Extended Data 
Fig. 5). Chronic repetitive bleomycin caused a similar reduction in Hh 
activation following injury (Extended Data Figs 5 and 6). Thus, Hh 
signalling is downregulated in response to epithelial injury in the lung, 
and is not upregulated as has been previously reported’””’. Of note, 
these results correlate with the loss of Shh-expressing epithelium 
after injury. 
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r-v, Conditional activation of Smo (SmoM2) within lineage-traced Glil * lung 
cells attenuates the proliferative expansion that follows epithelial injury with 
naphthalene. AW, airway; V, blood vessel. Scale bars, 100 um; *P < 0.05. 
Blue staining represents DAPI (h-k, q-t) except in confetti experiments where 
white represents TO-PRO-3 DNA counterstaining (m-p). Data represent 

n = 3 animals per group with 5 sections analysed per animal. Clonal analysis 
represents >50 clones analysed in 4 animals. Error bars, mean = s.d. 


To assess the behaviour of Glil * lung cells after epithelial injury, we 
exposed Gli17’®"?:R26R”"”° adult animals to tamoxifen followed by 
a one-week washout period before inducing lung epithelial injury with 
naphthalene. Hh-activated Glil * lung cells rapidly undergo proliferat- 
ive expansion after naphthalene injury (Fig. 3h-l). Utilizing the 
Gli17 ER? R26R°' mice for stochastic multicolour clonal analysis, 
we demonstrate that individual Glil* cells clonally expand after 
naphthalene injury (Fig. 3m-q). Reconstitution of Hh activation with 
SmoM2 during acute epithelial injury attenuates the normal expansion 
of mesenchyme following injury (Fig. 3r-v). In the bleomycin injury 
model, Hh signalling is also downregulated within Glil* mesenchy- 
mal cells after injury, which is similarly attenuated by the expression of 
activated SmoM2 (Extended Data Fig. 6). 

Despite an initial reduction in Hh activation during naphthalene 
injury, Shh and Glil expression return to homeostatic levels three 
months following injury (Fig. 4a—e) as the Shh-expressing bronchial 
epithelium is reconstituted (Extended Data Fig. 7a-d). Mesenchymal 
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Figure 4 | Hh signalling modulates restoration of quiescence during injury 
resolution in the lung. a—c, Hh activation returns to homeostatic levels 
three months after naphthalene injury in Gli1*“ lungs (n = 3 animals per 
group, 5 sections analysed per animal) and d, e, Shh and Gli1 expression returns 
to normal as noted by qPCR (n = 3 animals per group). f-m, r, Glil* lung 
cells undergo proliferative expansion shortly after naphthalene injury but return 
to quiescence by 2-3 months after injury. n-q, s, Conditional deletion of Smo in 


quiescence is also gradually restored after 2-3 months (Fig. 4f-m, r). 
Deletion of Smo within Glil* Hh-responsive cells prevented the res- 
toration of mesenchymal quiescence as these Glil * mesenchymal cells 
surrounding the airways continue to proliferate 2months after 
naphthalene injury (Fig. 4n-q, s). These data show that Hh activation 
is dynamically regulated after epithelial injury and is inversely corre- 
lated with mesenchymal proliferation as injury repair and regeneration 
progresses (Fig. 4t). 

Bronchial Scgblal” secretory cells have tremendous proliferative 
capacity to regenerate damaged epithelial airways after injury”? 
(Extended Data Fig. 7a-d). Therefore, we assessed whether Hh activa- 
tion in the mesenchyme alters secretory epithelial proliferation and 
regeneration in conditional Hh loss and gain of function mutants. 
Scgblal”’:Shh?"F* and Pdgfrb*:Smol”*:R26R"'"° mutants 
demonstrate a significant increase in bronchial epithelial proliferation 
(Fig. 5a-f), while the Glil creERT2: 6 ofloxiflox. RI6R™™™G mutants show a 
trend towards increased epithelial proliferation during normal 
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n = 3 animals per group with 5 sections analysed per animal unless 
otherwise noted. Error bars, mean = s.d. 


homeostasis (Fig. 5g-i). To determine whether mesenchymal 
activation of Hh signalling affects epithelial proliferation and regen- 
eration after injury, we activated and deleted Smo within Glil* 
mesenchyme during naphthalene-induced epithelial injury. 
Activation of Hh results in a marked loss of Scgblal* secretory epi- 
thelium 2 months after naphthalene injury relative to multi-ciliated 
epithelium (TubbIV*) (Fig. 5k, m), which does not undergo cellular 
turnover with naphthalene injury”. In contrast, inactivation of Hh 
signalling promotes excessive Scgbla1~ club cell regeneration, leading 
to bronchial hyperplasia (Fig. 51, m). Next, we generated lung orga- 
noids from Scgbla1-derived epithelium and cultured it in the presence 
or absence of isolated lung mesenchyme (Extended Data Fig. 7e-n). 
Organoids co-cultured with mesenchyme predominantly formed 
colonies expressing markers of the secretory lineage with a small frac- 
tion generating alveolar epithelial cells, while those without mesench- 
yme failed to form colonies (Extended Data Fig. 7e-k). Activation of 
SmoM2 in the co-cultured lung mesenchyme reduced the number and 
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Figure 5 | Hh signalling regulates epithelial quiescence via mesenchymal 
feedback. a-f, Deletion of Shh from the proximal secretory epithelium and 
deletion of Smo within cells derived from Pdgfrb* mesenchyme increase 
proliferation of Scgb1a1* club cells during homeostasis. g-i, Inducible deletion 
of Smo within Glil* lung mesenchymal cells results in a non-significant 
trend towards increased Scgbla1~ club cell proliferation. j-m, Activation of 
SmoM? in Glil* mesenchyme results in impaired regeneration of Scgblal* 
cells after naphthalene injury, whereas deletion of Smo in the mesenchyme 
induces excessive expansion of Scgbla1* cells resulting in bronchial 
hyperplasia. AW, airway. Scale bars, 100 um; *P < 0.05. Data represent n = 3 
animals per group with 5 sections analysed per animal. Error bars, mean = s.d. 


size of the epithelial colonies (Extended Data Fig. 7l-n). These data 
show that Hh promotes epithelial quiescence via a mesenchymal feed- 
back mechanism, possibly by downregulating stromal factors neces- 
sary for epithelial proliferation. 

In this study, we have demonstrated that the lung epithelium act- 
ively maintains mesenchymal quiescence through paracrine Hh sig- 
nalling, which also regulates a feedback loop to maintain epithelial 
quiescence. This finding stands in contrast to the known role of Shh 
in promoting cell proliferation during tissue development as well as its 
role in promoting tumorigenesis in adults. While previous reports 
have suggested that Hh signalling is pro-mitogenic in the adult 
lung’’°”°, our study is the first report to utilize multiple genetic 
models to assess Hh function in the adult lung in vivo and in vitro. 
Our data indicate that certain signalling pathways such as Hh maintain 
a balance between proliferation and quiescence during lung homeosta- 
sis and regeneration. Our studies reveal that disruption of this balance 
upon injury can lead to changes in expansion of the mesenchyme, 
which may disrupt epithelial regeneration after injury or in disease 
(Extended Data Fig. 8). 
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METHODS 


No statistical methods were used to predetermine sample size. 

Animals. Generation and genotyping of the Shh’? (ref. 9), Glit’* (ref. 10), 
Glit*"=8?? (ref. 12), Scgblal™ (ref. 13), Pdgfrb”® (ref. 14), R26R°"* (ref. 24), 
Smo!*F°* (ref. 25), Shh/“""* (ref. 26), R26R""" (ref. 27), R26R°"°™? (ref. 15), 
Pdgfrb* (ref. 17), UBCCER™ (ref. 28) lines have been previously described. The 
animals were housed and treated in accordance with the IACUC protocol 
approved at the University of Pennsylvania. Animals between the ages of 
8-12 weeks old were used for the experiments with balance of gender between 
groups. Tamoxifen (Sigma) was dissolved in corn oil and administered intraper- 
itoneally at 200 mgkg ' per day X 3 days for lineage tracing studies, with the 
exception of clonal analysis studies with the R26R°" reporter, where only 
one dose of tamoxifen was given at 200 mgkg '. 

Histological analysis. Mouse lungs were inflated and fixed in 2% paraformalde- 
hyde, dehydrated in a series of increasing ethanol concentration washes, embed- 
ded in paraffin and sectioned. Antibodies used were anti-sm22o (goat anti-SM220 
1:200 Abcam), GFP (goat anti-GFP 1:100 Abcam, rabbit anti-GFP 1:100 
Molecular Probe), Scgblal (goat anti-Scgblal 1:20 Santa Cruz), SPC (rabbit-anti 
SPC 1:500 Chemicon), Pdgfra (rabbit anti-Pdgfra 1:50 Cell Signaling), Pdgfrb 
(rabbit anti-Pdgfrb 1:100 Cell Signaling), vimentin (rabbit anti-vimentin 1:100 
Santa Crux), collagen type 1 (rabbit anti-Coll 1:500 Abcam), Ki67 (rabbit anti- 
Ki67 1:50 Abcam), PCNA (mouse anti-PCNA 1:50 Biocare), PO4-Histone H3 
(mouse anti-PO4-Histone H3 1:200 Cell Signaling), TubbIV (mouse anti-TubbIV 
1:20 Biogenex), $100A4 (rabbit anti-S100A4 1:200 Abcam). LacZ staining of lungs 
was performed as previously described®. The slide was imaged on a Zeiss LSM 710 
confocal microscope and analysed in ImageJ software. 

Animal injury experiments. For acute naphthalene injury, mice were given 
300 mgkg * of naphthalene (Sigma) dissolved in corn oil via intraperitoneal 
injection. For chronic bleomycin injury, mice were given 50Ukg™' of phar- 
maceutical grade bleomycin (Hospira) dissolved in PBS via intraperitoneal injec- 
tion twice a week for four weeks. 

Measurement of pulmonary artery pressure. Following anaesthesia with Avertin, 
the trachea was cannulated, and mice were ventilated using a MiniVent Type 845 
(Harvard Apparatus). The chest cavity was opened to expose the heart, and a Micro- 
Tip Catheter Transducer SPR-1000 (Millar Instruments) was inserted into the right 
ventricle. Systolic right ventricle pressure was measured as a surrogate for systolic 
pulmonary artery pressure, recorded on a PowerLab 4/30 instrument (ADIns- 
truments), and analysed using Chart 5 Pro software (ADInstruments). Pressure mea- 
surements associated with heart rates outside the range of 300-500 beats per minute 
were excluded from analysis. For each mouse, 2-4 measurements were analysed, each 
corresponding to the average of 10-20 individual data points. The experimenter was 
blinded to the mouse genotype and three mice of each genotype were examined. 
Clonal analysis of Gli??? RI6R°"™" Jungs. For clonal analysis of 
Gli ?:R26R'!"" mice, lungs were inflated and fixed in 2% PFA overnight, 
washed with cold PBS four times, and then cleared using the Scale reagent as 
reported”. Clarified lung specimen was then counterstained with TO-PRO3 (Life 
Technologies) for nuclear counterstaining and dissected into slices ~ 1 mm thick 
and mounted on a Fastwell with coverslip and sealed. Sections were imaged on a 
Zeiss LSM 710 confocal microscope and analysed in Image] software. Thick sec- 
tions were randomly sampled for single-coloured clones with identical-colour 
labelled cells within 50 jum of each other considered as derived from the same clone. 
Colour and spatially-segregated clones of 1-5 cells were identified and plotted in a 
box plot according to experimental conditions (vehicle versus naphthalene). 

Cell counting and image analysis. Sections included in cell count analysis were 
acquired using confocal microscopy. At least four animals per genotype were used. 
Cell counts were performed on ImageJ using the “Cell Counter” plug-in and the 
performer was blinded to the specimen genotype and condition. Results were aver- 
aged between each specimen and standard deviations were calculated per genotype. 
One-tailed paired t-tests were used to determine the P value. Quantification of X-gal 
(5-bromo-4-chloro-3-indolyl-B-p-galactopyranoside)-positive or GFP* pixels in 
lung sections was performed using ImageJ. Lung sections were captured on a 
Nikon Eclipse light microscope under identical exposures and converted to mono- 
chrome 8-bit images, inverted, and the mean grey value was quantified over the 
X-gal- or GFP-positive area surrounding the airway and vasculature. 

qPCR. Total RNA was isolated from whole lung or cultured primary lung fibro- 
blasts using the RNeasy kit (Qiagen) and following the manufacturer’s protocol. 
Complementary DNA was synthesized from total RNA using the SuperScript 
Strand Synthesis System (Invitrogen). Quantitative PCR was performed using 
the SYBR Green system (Applied Biosystems) with the following primers: 
Shh F’ 5'’-AAGTACGGCATGCTGGCTCGC-3’ 

Shh R' 5'-QCCACGGAGTTCTCTGCTTTCACAG-3’ 

Glil F’ 5'-GTGCACGTTTGAAGGCTGTC-3’ 


LETTER 
Glil R’ 5'-TAAAGGCCTTGCTGCAACCT-3' 


GAPDH F’ 5’-CCCCAGCAAGGACACTGAGCAAGAG-3’ 
GAPDH R’ 5’-GGCCCCTCCTGTTATTATGGGGGGT-3’' 

GAPDH expression values were used to control for RNA quantity. Data are 
shown as the average of a minimum of three biological replicates for each genotype 
per condition + s.d. 

Isolation and culture of lung mesenchymal cells. Whole lung was dissected from 
C57BL6 male adult animals and tracheally perfused with a digestion cocktail of 
Collagenase Type I (450 Uml |, Gibco), elastase (4U ml ', Worthington) and 
dispase (1:10 BD Bioscience) and removed from the chest. The lung was further 
diced with razor blades and the mixture incubated at 37°C for 25 min and vor- 
texed intermittently. The mixture was then washed with DMEM-F12 and incu- 
bated with 0.1% trypsin-EDTA for 20min and vortexed intermittently. The 
mixture was passed through a 100-1m cell strainer and resuspended in RBC lysis 
buffer, before passing through a 40-j1m cell strainer. The resuspended cells were 
cultured on gelatin-treated tissue culture plates with DMEM-F12 plus 10% fetal 
calf serum. Media was refreshed every other day and primary lung mesenchymal 
cells were maintained for no more than three passages. 

Microarray. Primary lung mesenchymal cells were isolated from UB 
R26R°”"°M? adult mice and grown in DMEM-F12 plus 10% fetal calf serum. The 
cells were treated with vehicle or 1 pgml* of 4-OH-tamoxifen in DMEM F12 
without serum and total RNA was isolated after 48 h. Biotinylated cRNA probe 
libraries were generated from these RNA samples and assayed with the Affymetrix 
Mouse Gene 2.0ST genechip. Microarray data were analysed using the Oligo 
package available at the Bioconductor Website (http://www.bioconductor.org). 
The raw data were background-corrected by the robust multichip average 
(RMA) method and then normalized by an invariant set method. Genes with 
80% of samples with an expression signal above the negative control probes were 
considered detectable or present. Differential gene expression analysis between 
control and mutant mice was analysed by the Limma package available at the 
Bioconductor Website. P values were adjusted for multiple comparison using a 
false discovery rate. GO enrichment analysis was performed using the 
Bioconductor package topGO. The Gene Expression Omnibus accession number 
for the microarray data produced in these studies is GSE68201. 

Cell proliferation assay. Lung mesenchymal cells were isolated from UBC® re 
R26R8"°? animals and plated at 1 X 10* cells per well in 96 well plates and grown 
for 3 days with vehicle or 1 pg ml‘ of 4-OH-tamoxifen until cells became con- 
fluent. Cells were then incubated in serum-free DMEM F12 for 24h before 
Pdgf-BB (mouse, R&D) was added and cultured for another 24h. BrdU was then 
added to the media after 24 h of Pdgf-BB incubation and BrdU incorporation was 
assayed after four hours according to manufacturer instructions (Cell Signaling 
Technology, BrdU cell proliferation assay kit). 

Bronchial organoid formation assay. GFP* bronchial epithelium were FACS 
sorted from Scgblal’:R26R”"”"@ lungs and co-cultured with lung mesenchyme 
isolated from UBC?*”:R26R°""™? animals (5 X 10° epithelial cells to 5X 10° 
mesenchymal cells per well) in a modified MTEC media diluted 1:1 in growth factor 
reduced Matrigel (Corning). Modified MTEC culture media is comprised of small 
airway basal media (SABM) (Lonza) with selected components from SAGM bullet 
kit (Lonza) including insulin, transferrin, bovine pituitary extract, retinoic acid, and 
gentamicin/amphotericin B. Additional components include 25ngml”' mEGF 
(Sigma), 0.1 1g ml * cholera toxin (Sigma), and 5% FBS (Life Technologies). Cell 
suspension-Matrigel mixture was placed in a transwell and incubated in growth 
media with 10 14M ROCK inhibitor (Sigma) in a 24-well plate with vehicle or 1 ug 
ml | 4-OH-tamoxifen for 48 h, after which the media was replenished every 48h 
(lacking tamoxifen). Colonies were assayed after 14 days. Each experimental con- 
dition was performed in quadruplicates and counted blinded to the experimental 
condition. Colony forming efficiency = (number of GFP* colonies/number GFP* 
epithelial cells cultured per well) X 100. Areas of individual colonies were assayed 
on ImageJ and over 140 colonies were randomly sized per experimental condition. 
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Extended Data Figure 1 | Characterization of Hh signalling in the lung. a, A 
small number of TubbIV * ciliated cells express GFP in the Shh°*"” reporter. 
b, c, Glil* Hh activated cells co-label with $100A4, a fibroblast marker. 

d-g, Lineage tracing of Glil* Hh activated cells shows little to no co- 
localization of GFP” cells with the airway smooth muscle (d, e) or the vascular 
smooth muscle of the adjacent pulmonary artery (f, g). h, i, The rare exception 
occurs in the pulmonary vein where Glil* Hh activated cells contribute to 
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the venous smooth muscle that is surrounded by the venous myocardium of the 
proximal pulmonary vein. j, Glil* cells also generate myofibroblasts after 
fibrotic injury such as that induced by bleomycin. k, Lung Glil* cells do not 
contribute to cells of haematopoietic lineage as marked by CD45. AW, 
airway; V, blood vessel; PV, pulmonary vein. Scale bars, 100 um. Images 
representative of 3 animals with 5 sections examined per animal. 
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Extended Data Figure 2 | Conditional deletion of Shh from the adult airway —_ (1 = 4 animals). ch, Deletion of Shh from the airway epithelium resulted in an 
epithelium increases proliferation in the adjacent mesenchyme. a, The increased expression of proliferative markers, PCNA and phospho-histone H3 
Scgblal* driver predominantl ty marks the airway epithelium in the adultlung (PH3) in the mesenchyme surrounding the airways in Sq egblal™:Shh" onde 
when crossed with the R26R”"”* reporter. b, Whole-lung messenger RNA mutants (d, f, h, 1 = 4 animals) versus controls (Scgbla1~”* :Shh"/* ) 
transcript analysis reveals efficient deletion of Shh transcripts in the (c, e, g, n = 3 animals). AW, airway; V, blood vessel. Scale bars, 100 um. 
Segbial™-shil® x/flox animals compared to controls (Shpfewsfex) *P <0.05. Error bars, mean = s.d. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER flayanite, 


Pdgfrb**:Smo"™* Pdgfrb“*: Smo" Pdgfrb*:Smo"™* 


Pdgfrb*:Smo'x‘icx 
P CF a = = zz: <4, 4 - = 


wer . 
\ ty @ = 


Pdgfr Bee:Smo flox/flox ‘-R26R™ mG 


q: 


N * 
2 8 
a 
oO 
x 6 
# 
& 4 
Bs 
o 
G 
GS 2 
M~ 
co 
< 
+ 


Pdgfr Bc: Pdgfr Bc: 
S mo flox/+ : Smo flox/flox : 
R26R™TmaG R26R™™maG 


50 Pdgfr Boe:Smo flox/+ ‘-R26R™ mG Pdgfr Boe:Smo flox/flox ‘-R26R™mG 
2 Ps He tid a 
g 0 RY wall: RV wall, 
os ( tum) \ OW septum 
ep 30 'o useane,\ Wa) ad 
ne as : 
2 90 a 
. wy, wall : LV wall 
S 10 3 
a 


Pdgfr B°°: Pdgfr Bc: 
S mo flox/+ : S mo flox/flox ; 
R26R™™a R26R™™aG 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 3 | Conditional deletion of Smo from Pdgfrb-derived 
mesenchyme increases mesenchymal proliferation at the epithelial- 
mesenchymal interface and vascular remodelling. a-d, Deletion of Smo 
from Pdgfrb-derived mesenchyme resulted in increased expression of 
proliferative markers, PCNA and PH3 in the mesenchyme at the epithelial- 
mesenchymal interface of Pdgfrb°*:Smo" ox/flox mutants (b, d) versus controls 
(Pdgfrb “"*:Smol’*) (a, c). e-i, Pdgfrb™:Smo””":R26R™'”"°/* mutants 


exhibit increased Ki67* cells within lineage traced GFP" cells in the 

alveoli compared to controls (n = 4 animals per group). j, k, Aged 
Pdgfrb:Smo!"*:R26R™""'~ mutants (>6 months old) spontaneously 
develop pulmonary hypertension with increased right ventricular systolic 
pressure (j, n = 3 animals per group) and right ventricle wall thickness (k, 1). 
Scale bars, 100 um. *P < 0.05. Error bars, mean = s.d. 
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Extended Data Figure 4 | Characterization of isolated lung mesenchyme. 
a-h, Isolated mesenchymal cells in vitro predominantly express Pdgfra and 
Pdgtrb (c, d, compared to isotype control a), but not epithelial marker, Epcam 
(b, compared to isotype control a), endothelial marker, CD31 (f, compared 
to isotype control e), nor haematopoietic marker, CD45 (h, compared to 
isotype control g). i, Expression of the constitutively active form of Smo 
(SmoM2) by 4-OH-tamoxifen induction significantly upregulates Glil 
expression in the isolated lung mesenchyme after 48 h. j, GFP staining of the 
Pdgfra°"” reporter demonstrates that Pdgfra * cells are expressed broadly in the 


lung. k, Activation of Smo in isolated lung mesenchymal cells leads to 
reduced expression of cell cycle progression genes. 1, Hh activation of lung 
mesenchyme with SmoM2 attenuated the proliferation induced by Pdgf-BB 
ligand in vitro as assayed by BrdU incorporation. m-p, s, Expression of 
activated Pdgfrb(Pdgfrb°™’) within Glil~ cells resulted in their proliferative 
expansion. q-s, Concurrent activation of SmoM2 attenuated the proliferative 
expansion induced by Pdgfrb?™* (n = 3 animals per group). Scale bars, 100 jm. 
Error bars, mean + s.d. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


ShhcreGFP 


naphthalene 


i 


AW 


mean gray value 


Shh (GFP) Shh (GFP) Shh (GFP) 


Extended Data Figure 5 | Shh expression is decreased with bleomycin and _ expression in the airways of the Shh**"” reporter compared to controls. Data 
naphthalene injury to the airway. a-g, Repetitive bleomycin injury after one represent n = 2 animals per group with 5 sections analysed per animal, AW, 
month or single-dose naphthalene injury after three days reduced GFP airway. Scale bars, 100 um. *P < 0.05. 
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Extended Data Figure 6 | Hedgehog modulates mesenchymal response to 
bleomycin injury. a-e, Chronic repetitive injury to the lung epithelium with 
bleomycin over 4 weeks downregulates Glil expression in the mesenchyme 
adjacent to the airways in Gli1'*“ lungs as noted by histochemical staining for 
B-galactosidase activity (a-d) and as noted using X-gal quantification and 
qPCR analysis of Shh and Glil expression after four weeks of repetitive injury 
(e, n = 2 animals per group). f-i, n, Lineage-traced, Glil* Hh activated lung 
mesenchymal cells undergo proliferative expansion after repetitive bleomycin 
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injury with an increase in Ki67* mesenchymal cells (n = 4 animals per group). 
j-m, 0, Expression of Smo™” within lineage traced Glil* ung mesenchymal 
cells attenuates the proliferative expansion that normally follows repetitive 
bleomycin injury (n = 3 animals per group). p-v, Glil expression remains 
reduced 2 months after the end of bleomycin treatment (p-r, n = 2 animals per 
group), which might be due to the persistent scarring that is observed after 
repetitive bleomycin injury (s-v). AW, airway; V, blood vessel. Scale bars, 
100 um; *P < 0.05. Error bars, mean = s.d. 
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Extended Data Figure 7 | Airway epithelium is able to regenerate in vitro 
and in vivo. a, b, d, Scgblal* secretory epithelium is initially depleted 3 days 
following naphthalene injury while TubbIV~ ciliated epithelium remains 
relatively intact (1 = 3 animals per group). c, d, However, 2 months after the 
initial naphthalene injury, Scgblal™ secretory epithelium repopulates the 
airway and the ratio of secretory/ciliated cells is restored to levels before injury. 
e, f, GEP* bronchial epithelial cells isolated from Scgb1al cre. R26R™!""S animals 
were cultured in the presence or absence of isolated lung mesenchymal cells, 
and only those co-cultured with lung mesenchymal cells were able to form 
organoids. g, h, Examples of the 3-dimensional structures formed by the 
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bronchial organoids in the presence of lung mesenchyme. i-k, Scgb1al-derived 
organoids predominantly express markers of secretory airway differentiation, 
including Scgblal and Nkx2.1 (i, j), while a minority expresses markers of 
alveolar epithelial lineage including Sftpc (k). -n, Co-culture of lung 
mesenchyme and bronchial epithelium induces organoid formation (1), which 
is inhibited in number (m) and colony size (n) with activation of Hh in the 
mesenchyme. AW, airway. Scale bars, 100 um. *P < 0.05. n = 3 animals per 
group for injury time points. Error bars, mean ~ s.d. In vitro organoid studies 
represent technical quadruplicates, with > 140 randomly selected clones 


analysed for size per group. 
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Extended Data Figure 8 | Hh signalling mediates both mesenchymal and quiescence. Epithelial injury leads to downregulation of Hh signalling and loss 
epithelial quiescence during homeostasis and injury repair in the lung. The — of mesenchymal quiescence, which in turn stimulates epithelial regeneration to 


lung epithelium actively maintains mesenchymal quiescence through paracrine __ replete the airway epithelium until homeostasis is re-established. 
Hh signalling, which also regulates a feedback loop to maintain epithelial 
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RAF inhibitors that evade paradoxical MAPK 
pathway activation 
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Oncogenic activation of BRAF fuels cancer growth by constitutively 
promoting RAS-independent mitogen-activated protein kinase 
(MAPK) pathway signalling’. Accordingly, RAF inhibitors have 
brought substantially improved personalized treatment of metastatic 
melanoma”. However, these targeted agents have also revealed an 
unexpected consequence: stimulated growth of certain cancers®”’. 
Structurally diverse ATP-competitive RAF inhibitors can either 
inhibit or paradoxically activate the MAPK pathway, depending 
whether activation is by BRAF mutation or by an upstream event, 
such as RAS mutation or receptor tyrosine kinase activation’®””. Here 
we have identified next-generation RAF inhibitors (dubbed ‘paradox 
breakers’) that suppress mutant BRAF cells without activating the 
MAPK pathway in cells bearing upstream activation. In cells that 
express the same HRAS mutation prevalent in squamous tumours 
from patients treated with RAF inhibitors, the first-generation RAF 
inhibitor vemurafenib stimulated in vitro and in vivo growth and 
induced expression of MAPK pathway response genes; by contrast 
the paradox breakers PLX7904 and PLX8394 had no effect. Paradox 
breakers also overcame several known mechanisms of resistance to 
first-generation RAF inhibitors. Dissociating MAPK pathway inhibi- 
tion from paradoxical activation might yield both improved safety and 
more durable efficacy than first-generation RAF inhibitors, a concept 
currently undergoing human clinical evaluation with PLX8394. 
Selective RAF inhibitors including vemurafenib’ and dabrafenib“* 
have demonstrated both objective tumour response and, in the case of 
vemurafenib, overall survival benefit in mutant BRAF’°°-driven mela- 
noma. The clinical effectiveness of RAF inhibitors depends on near- 
complete abolition of the MAPK pathway output in tumours harbouring 
BRAF mutations’*. However, these compounds paradoxically activate 


the MAPK pathway in cells bearing oncogenic RAS or elevated upstream 
receptor signalling'®’. This paradox can promote cellular proliferation 
and manifest clinically with progression of cutaneous squamous cell 
carcinomas (cuSCC) and keratoacanthomas, sometimes within weeks 
of therapy initiation®’’. These paradox-induced skin tumours have an 
uncharacteristically high incidence of RAS mutations®'’, raising the 
concern that the same mechanism might accelerate progression of 
other RAS-driven cancers. Recent case reports of increased incidence 
of primary melanomas’ and progression of RAS-mutant leukaemia and 
colon carcinoma during RAF inhibitor treatment*° add weight to the 
concern. Although combination with MEK inhibition represents one 
strategy to combat paradoxical activation, and such combinations did 
show improved clinical responses’”"*, the combination of these two 
costly agents yields increased adverse events, and resistance still develops. 
Our strategy to develop next-generation RAF inhibitors is thus to design 
potent BRAFY®° mutant inhibitors that avoid paradoxical activation of 
MAPK signalling. 

Vemurafenib analogues with variable terminal sulfonamide and 
sulfamide substitutions were screened against a panel of cell lines for 
compound-induced change in phospho-ERK1/2 (T202/Y204, pERK). 
For each compound, the dissociation of pERK inhibition from activa- 
tion (dubbed “ERK pathway inhibition index’ or EPII) was expressed 
as the ratio between the compound’s mean pERK activation half- 
maximum effective concentration (ECs9) in three RAS mutant cell lines 
(murine cuSCC cell line B9, human melanoma cell line IPC-298, and 
human colorectal carcinoma cell line HCT116, Table 1), and the com- 
pound’s mean pERK inhibition half-maximum inhibitory concentra- 
tion (ICs) in two BRAFY®°= melanoma cell lines (A375 and 
COLO829, Table 1). The EPIIs for vemurafenib and dabrafenib were 


Table 1 | Comparison of the in vitro profile* of first-generation BRAF inhibitors with a paradox breaker 


Compound Biochemical ICs (uM) pERK inhibition ICs (uM) pERK activation ECs (uM)+ EPIlt 
BRAFYSO°F BRAF CRAF A375 COLO829 BO IPC-298 HCT116 
Vemurafenib 0.031(+0.004)  0.1(+0.02) 0.048 (+0.004) 0.032 (+0.007) 0.041 (+0.008) 0.36(+0.08) 0.54(+0.12) 0.34(+0.07) 11 

PLX4720 0.013 (+0.005) 0.16 (+0.03) 0.007 (+0.003) 0.044 (+0.006) 0.039 (+0.023) 0.24(+0.03)  0.4(+0.05) 0.29(+0.17) 7 

PLX7683. 0.029 (+0.021) 1.1 (+0.6) 0.44 (+£0.23) 0.98 (+0.75) 1.7 (+0.82) >200 >200 >200 >100 

PLX7904 0.0042 (+0.0006) 0.14(+0.02) 0.091 (+0.014) 0.016 (+0.005) 0.018 (+0.005) >200 >200 >200 >10,000 

PLX8394 0.0038 (+0.0016) 0.014 (+0.004) 0.023 (+0.04) 0.0035 0.0021 >200 >200 >200 >50,000 

(+0.0012) (+0.0012) 

PLX5568 0.58 (+£0.07)  0.19(+0.02) 0.021 (+0.002) >10 >10 5.1 (£2.5) 3.2 (+1.9) 7(#3.2)  <05 

Sorafenib 0.35(+0.04) 0.072 (+0.008) 0.011(+0.002) 4.4(+1.3) 2(+1.2) 0.025 (+0.005) 0.019 (+0.01) 0.086 (+0.04) 0.01 
Dabrafenib 0.0054 0.0027 0.0015 (+0.001) 0.001 (+0.001) 0.005 (+0.003) 0.018 (+0.005) 0.018 0.0038 ~4 

(£0.0015) (+0.001) (+0.005) (+0.002) 

PLX7922 0.012 (+0.008) 1.1(+0.4) 0.053 (+0.006) 0.01 (+0.003) 0.014 (+0.008) >10 3.3 (+2.4) >10 >500 
*Mutational status of the cell lines: A375, BRAFY°°°F, homozygous; COLO829, BRAFY®°°£, heterozygous; B9, HRAS2°!": IPC-298, NRAS°°!"; HCT116, KRAS®!2°. Each value is an average of more than four 
experiments. Values in parenthesis, s.e.m. 
+ECs0, the concentration increasing pERK to 50% compared with the positive control, 10 4M PLX4720. 

{ERK pathway inhibition index (EPII), the ratio between mean pERK activation EC59 and mean pERK inhibition ICso. 


§Using the rising portion of the concentration-response curve (Fig. 3c). 


1Plexxikon Inc., 91 Bolivar Drive, Berkeley, California 94710, USA. 
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11 and 4, respectively. Among the compounds that exhibited more than 
100-fold EPII was a molecule (PLX7683, Fig. la and Table 1) that con- 
tained an N-ethylmethyl-sulfamide moiety in lieu of the propyl-sulfona- 
mide tail of vemurafenib. Optimization of this series of compounds by 
substitution on the 5-position of the 7-azaindole scaffold generated 
PLX7904 (Fig. 1a), which potently inhibited pERK in BRAF’®™ cells 
but showed essentially no pERK activation in RAS mutant cell lines at the 
concentrations tested (Table 1, Fig. 1b and Extended Data Fig. 1). 
PLX7904 was also evaluated in the human SCC cell line A431 and the 
human breast adenocarcinoma cell line SKBR3 as these cells achieve 
MAPK pathway activation by upstream signals feeding into RAS 
(through overexpression of epidermal growth factor receptor (EGFR) 
and human epidermal growth factor receptor 2 (HER2), respectively). 
Unlike vemurafenib, PLX7904 did not increase pERK levels in these cells 
(Fig. 1c). In biochemical assays using recombinant kinases, PLX7904 
showed preferential inhibition of the mutated BRAFY”” over wild-type 
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Figure 1 | Paradox breakers dissociate MAPK pathway inhibition from 
activation. a, Vemurafenib and paradox breakers PLX7683, PLX7904, and 
PLX8394. b, pERK ICsp curves (mean + s.d.) in A375 (BRAFY °°") cells and 
pERK ECs» curves (mean + s.d., normalized to maximal pERK level induced by 
PLX4720) in B9 (HRAS°°"") cells (n = 5 experiments). c, pERK in A431 

and SKBR3 cells after treatment for 1 h with vemurafenib or PLX7904 (full 
scans of western blot in Supplementary Figure 1). Repeated three times. 

d, Anchorage-independent growth of B9 cells with vemurafenib and PLX4720 
(for 3 weeks) but not PLX7904 (mean = s.d., two experiments, three replicates 
each). e, PLX7904 and vemurafenib inhibited the COLO205 xenograft 
growth (mean + s.e.m., eight mice per group). f, B9 subcutaneous xenografts 
were stimulated by vemurafenib (*P < 0.05 by two-sided t-test) but not by 
PLX7904 (mean + s.e.m., ten mice per group). 
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BRAF and CRAF and a level of kinome selectivity comparable to that of 
vemurafenib’ (Supplementary Table 1). 

PLX7904 inhibited the in vitro growth of two aforementioned mel- 
anoma cell lines (A375 and COLO829) and an additional human 
colorectal cancer cell line COLO205 that expressed BRAFY©™ with 
ICs values of 0.17 1M, 0.53 uM, and 0.16 1M, respectively, on a par 
with vemurafenib IC, values in the same assays (0.33 1M, 0.69 UM, 
and 0.25 41M, respectively). Consistent with this in vitro result, 
PLX7904 and vemurafenib produced similar anti-tumour effects in a 
subcutaneous COLO205 xenograft model (Fig. le) with matching 
doses (25 mg per kg twice daily) and plasma exposures (steady-state 
area under the curve ~ 200,000 ng ml h). 

Recent analyses®’ of the cuSCC and keratoacanthoma lesions 
excised from vemurafenib recipients revealed that up to 60% of the 
specimens harboured RAS mutations, mostly HRAS?°"”, supporting 
an important role of RAS mutation in BRAF inhibitor-induced cuSCC. 
The B9 cuSCC mouse cell line expresses the same activated HRAS2°'" 
allele’. In soft agar, both vemurafenib and its analogue PLX4720 
stimulated B9 colony formation at concentrations similar to the 
growth inhibitory ICs, values in A375, COLO829, and COLO205 cells, 
whereas PLX7904 did not (Fig. 1d). When tested in vivo, subcutaneous 
B9-tumour growth was accelerated by vemurafenib but not by the 
equally potent BRAFY° inhibitor PLX7904 when administered at 
the same dose (Fig. 1f). 

We compared the gene expression changes in B9 cells treated over- 
night with vemurafenib and PLX7904. Vemurafenib altered transcrip- 
tion of 191 mouse genes by at least 1.9-fold, while PLX7904 had 
minimal effects (Extended Data Fig. 2 and Supplementary Table 2). 
Of the genes significantly induced by vemurafenib, three encode EGFR 
ligands: amphiregulin, heparin-binding EGF-like growth factor, and 
transforming growth factor-o (TGF-a«) (Extended Data Fig. 2c). The 
upregulation of these autocrine growth factors was confirmed at pro- 
tein level and their role in potentiating the transforming potential of 
activated HRAS was demonstrated (Extended Data Figs 3 and 4). 
Induction of these ligands by vemurafenib has been demonstrated 
independently in vemurafenib-resistant lung cancer cell lines*®. All 
three ligands can promote cuSCC”. These data implicate EGFR sig- 
nalling as a potential molecular link between paradoxical MAPK 
activation by RAF inhibitors and secondary malignancies. In contrast 
to vemurafenib and consistent with the paradox breaker profile, 
expression of the EGFR ligands was largely unaffected by PLX7904 
(Extended Data Figs 2-4). 

PLX7904 and a further optimized analogue PLX8394 (Fig. la and 
Table 1) are only subtly different from vemurafenib based on chemical 
structure. To understand how such small molecular alterations cause a 
drastic change in the biological profile, we obtained the crystal struc- 
ture of PLX7904 in complex with BRAFY®F (Extended Data Table 1). 
The overall binding of PLX7904 (Fig. 2a) is similar to that of vemur- 
afenib (Extended Data Fig. 5a) with the terminal N-ethylmethyl group 
of PLX7904 occupying the same small interior pocket as the propyl 
group of vemurafenib. However, the methyl group of the 
N-ethylmethyl moiety forms closer contact with Leu505 in the pocket 
(Fig. 2b). Leu505 is one of the four residues that compose the so-called 
regulatory spine of kinases” (Extended Data Fig. 6). Situated close to 
the carboxy (C)-terminal end of the «C helix, Leu505 is the only 
residue from that helix that makes direct contact with the inhibitor. 
RAS promotes RAF dimerization, and paradoxical MAPK pathway 
activation results from binding of the inhibitor to one protomer of a 
RAF dimer which allosterically transactivates the other protomer’”"”. 
The aC helix plays a critical role in RAF dimer formation’®”? and 
mutations that disrupt the aC helix dimer contacts counteract RAF 
activation by inhibitors. In an enzyme-linked immunosorbent assay 
(ELISA) of dimerization using cell lysates, vemurafenib and other 
known BRAF inhibitors promote BRAF-CRAF heterodimer forma- 
tion in RAS mutant cells, whereas the dimer formation is indifferent to 
the presence of PLX7904 (Fig. 2c). Although the crystal structure did 
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Figure 2 | Molecular mechanisms of paradox 
breakers. a, Interactions between PLX7904 (green) 
and BRAFY6 (grey). Phe595 (spheres) shows 
the DFG-in (or type 1) conformation. Red dashed 
lines represent hydrogen bonds. b, The propyl- 
sulfonamide tail of vemurafenib (cyan) and the 
N-ethylmethyl-sulfamide tail of PLX7904 (green) 
viewed from the dimer interface. Four residues 
(Ile527, Leu505, Phe595, and Leu567) form the 
R-spine (see Extended Data Fig. 6 for definition). 
A dotted surface around the N-methyl group in 
PLX7904 illustrates its close contact with Leu505 
from the oC helix (orange). c, BRAF-CRAF 
heterodimer formation in IPC-298 cells with 
increasing concentrations of RAF inhibitors 
(treatment for 1h; mean + s.d.,n =5 
experiments). d, Type 2 (that is, DFG-out) binder 
PLX5568 is a CRAF-selective inhibitor with inverse 
EPI (mean + s.d., n = 5 experiments). 
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not reveal further displacement of the «C helix by PLX7904 (Extended 
Data Fig. 5c), compared with the vemurafenib-bound structure the 
three terminal atoms (Cy, C61, C52) of Leu505 shifted by 0.6-1 Ato 
accommodate PLX7904 (Extended Data Fig. 5d). Leu505 has a higher 
side-chain crystallographic temperature factor (B-factor = 75) than 
the same residue in the vemurafenib structure (B-factor = 33). In 
solution where the protein is free of the artificial constraints of crystal, 
the strong interaction between PLX7904 and Leu505 could lead to 
outward movement of the «C helix, causing disruption to the RAF 
dimer interface. The vemurafenib-resistant L505H BRAF mutant 
remained sensitive to PLX7904 (ref. 24), supporting the key role played 
by residue 505 in sensing the structural difference between paradox 
breakers and first-generation RAF inhibitors. 

The crystal structure of PLX4720 in complex with wild-type BRAF 
showed that the compound adopts a type 2 kinase inhibitor binding 
pose when accessing the inactive conformation of the kinase, the pre- 
ferred state of wild-type RAF proteins*. A PLX4720 analogue, 
PLX5568, made to enforce the type 2 binding orientation (Fig. 2d), 
has intrinsic selectivity towards CRAF (Table 1). Like other type 2 RAF 
inhibitors such as sorafenib, PLX5568 showed marginal inhibitory 
activity against BRAFY®® cells but still paradoxically activated 
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pERK in mutant RAS cells, thus exhibiting inverse EPII (Table 1 and 
Fig. 2d). These data highlight the existence of a strong correlation 
between conformation-specific inhibition and biological outcome. 

A newly discovered” and potentially common” mechanism whereby 
BRAFY°" melanomas develop resistance to BRAF inhibition is to 
express aberrantly spliced forms of BRAFY®° that can dimerize in 
the absence of activated RAS. Wild-type BRAF can be activated in a 
similar manner when a chromosomal translocation event results in a 
truncated C-terminal fragment of BRAF embedded in a fusion gene 
with oncogenic activity (Supplementary Table 3 and references therein). 
The fusion kinase, like the spliced forms of BRAFY°™=, dimerizes and 
has constitutive kinase activity and intrinsic resistance to first-genera- 
tion RAF inhibitors. To test whether the paradox breaker strategy can be 
exploited to combat dimerization-mediated resistance, we compared 
the activity with PLX7904 in the SK-MEL-239 parental cell line and 
a representative vemurafenib-resistant clone (C3) expressing a trun- 
cated BRAFY®", PLX7904 demonstrated minimal shift in pMEK 
ICs9 and modest increase in growth inhibition ICs) in C3 cells 
(Extended Data Fig. 7). Furthermore, both PLX7904 (that is, PB04) 
and PLX8394 (that is, PB0O3) overcame RAF inhibitor resistance in 
BRAF fusions characterizing paediatric astrocytomas”* and maintained 
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activity against cells that are vemurafenib-resistant through secondary 
mutation in NRAS”. 

The discovery of paradox breakers confirms that the two opposing 
modes of action of RAF inhibitors, either blocking or activating the 
MAPK pathway, can be uncoupled. Since the tail moiety is primarily 
responsible for this uncoupling, we engineered the PLX7904 tail onto 
the dabrafenib scaffold (Fig. 3a, b). The resulting compound, PLX7922, 
showed significantly increased EPII (Fig. 3c and Table 1). Thus, the 
structural and chemical principles of paradox breaking can be applied 
to improve the safety and biological profile of other RAF inhibitors. 

An alternative strategy to overcome paradoxical activation is to com- 
pletely block all RAF isoforms (pan-RAF inhibition), thus severing the 
link between RAS and MEK/ERK. AZ-628, the first compound in this 
class, did show reduced (rather than induced) pERK/pMEK in RAS 
mutant cells, but possesses unfavourable pharmaceutical properties’®. 
Recently, new pan-RAF inhibitors with ancillary activity on upstream 
SRC family kinases have been reported**. The concern for pan-RAF 
inhibitors is that blocking MAPK signalling in normal tissue will cause 
toxicity. Thus, paradox breakers should afford a higher therapeutic index 
than the first-generation RAF inhibitors and pan-RAF inhibitors. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


With the exception of tumour xenograft studies, no statistical methods were used to 
predetermine sample size. The experiments were not randomized. The investigators 
were not blinded to allocation during experiments and outcome assessment. 
Generation of BRAF small molecule inhibitors. All solvents and reagents were 
used as obtained from commercial sources. Starting materials were purchased 
from commercial sources or prepared according to methods reported in the 
literature. Reactions involving air- or moisture-sensitive reagents were performed 
under a nitrogen atmosphere. NMR spectra were recorded in deuterated solvent 
with an Agilent 400 MHz MR DD2 spectrometer system equipped with an Oxford 
AS400 magnet. Chemical shifts are expressed as 6 units and referenced to the 
residual 'H or '3C solvent signal. All coupling constants (J) are reported in hertz (s, 
singlet; d, doublet; t, triplet; q, quartet; m, multiplet; br, broad peak; dd, doublet of 
doublets; ddd, doublet of doublet of doublets; dm, doublet of multiplets). Mass 
spectra were measured with a Shimadzu LCMS-2020 spectrometer coupled to a 
Shimadzu 20A high-performance liquid chromatography (HPLC) system oper- 
ating in reverse mode. Analytical purity was greater than 95% for final compounds 
and was determined using the following HPLC method. Buffer A: 5% acetonitrile, 
95% water, 0.01% formic acid; buffer B: 95% CH3CN, 5% water, 0.01% formic 
acid; SiliaChrom XDB C18, 5 ttm, 2.1 X 50 mm, 5-95% B in 6 min, 1.0 ml min |, 
220nm and 254nm, electrospray-ionization-positive (ESI-positive), 300-800 
atomic mass units. 
(2,6-Difluoro-3-nitrophenyl)(5-iodo-1H-pyrrolo[2,3-b]pyridin-3-yl)methanone. 
5-Iodo-1H-pyrrolo[2,3-b]pyridine (160 g, 0.656 mol) and aluminium chloride 
(525 g, 3.94 mol) in nitromethane (1,640 ml) were allowed to stir at room tem- 
perature (20-25 °C) for 1h. Then 2,6-difluoro-3-nitrobenzoyl chloride (218 g, 
0.984 mmol) in nitromethane (1,640 ml) was added and the mixture was heated 
at 50°C for 4 days. After cooling to 0 °C, the reaction was quenched with the 
dropwise addition of methanol (1.5 1), resulting in a precipitate. The mixture was 
diluted with water (2 1) and filtered. The crude product was triturated with methyl 
tert-butyl ether and filtered to give the title compound as a tan solid which was 
used directly in the next step (281 g, theory) without further purification. 'H NMR 
(400 MHz, dimethylsulfoxide (DMSO)-dg) 13.18 (br s, 1 H), 8.82 (s, 1 H), 8.62 
(s, 1 H), 8.46 (m, 1 H), 8.40 (s, 1 H), 7.55 (m, 1 H). 
(3-Amino-2,6-difluorophenyl)(5-iodo-1H-pyrrolo[2,3-b]pyridin-3-yl)methanone. 
To (2,6-difluoro-3-nitrophenyl)(5-iodo-1H-pyrrolo[2,3-b]pyridin-3-yl)metha- 
none (281 g, 656 mmol) in ethyl acetate (10.9 1) and tetrahydrofuran (10.9 1) 
was added tin(II) chloride dihydrate (517 g, 2.29 mol) portionwise while heating 
at 60°C. The reaction mixture was held at this temperature overnight. After 
cooling to room temperature, the reaction mixture was quenched with 50% satu- 
rated aqueous sodium bicarbonate (1:1 water and saturated aqueous sodium 
bicarbonate) and filtered through Celite washing the cake with ethyl acetate. 
The layers were separated and the organic layer was washed with brine and then 
concentrated under reduced pressure to give the crude product, which was tritu- 
rated with methyl tert-butyl ether and filtered to give the title compound as a tan 
solid (216 g, 541 mmol, 83% yield). 1H NMR (400 MHz, DMSO-d) 12.96 (br s, 
1H), 8.72 (s, 1 H), 8.56 (d, J = 2.0 Hz, 1 H), 8.06 (s, 1 H), 6.92 (dd, J = 8.6 Hz, 1H), 
6.88 (m, 1 H), 5.20 (s, 2 H); liquid chromatography—mass spectrometry (LC/MS) 
(ESI-positive) m/z: 399.9 (M+ H*). 
(3-Amino-2,6-difluorophenyl)(5-(2-cyclopropylpyrimidin-5-yl)-1H-pyrrolo- 
[2,3-b] pyridin-3-yl)methanone. A mixture of (3-amino-2,6-difluorophenyl)(5- 
iodo-1H-pyrrolo[2,3-b]pyridin-3-yl)methanone (93 g, 233 mmol), 2-cyclopropyl- 
5-(4,4,5,5-tetramethyl-1,3,2-dioxaborolan-2-yl)pyrimidine (229g, 466 mmol, 
~ 50% purity), potassium carbonate (97.0g, 702mmol), and [1,1’-bis- 
(diphenylphosphino)ferrocene]dichloropalladium(II) dichloromethane com- 
plex (19.0 g, 23.3 mmol) in dioxane (930 ml) and water (465 ml) was heated at 
100 °C for several hours. Upon cooling, the reaction mixture was diluted with 
water and extracted with a mixture of tetrahydrofuran and ethyl acetate. The 
organic layer was separated and concentrated under reduced pressure to give the 
crude product, which was triturated with dichloromethane/methyl tert-butyl 
ether and filtered, washing with methyl tert-butyl ether to give the title compound 
as a tan solid (71.0 g, 78% yield). 'H NMR (400 MHz, DMSO-dg) 12.95 (brs, 1 H), 
9.07 (s, 2 H), 8.71 (d, J = 2.3 Hz, 1 H), 8.66 (s, 1 H), 8.11 (s, 1 H), 6.92 (dd, J = 9.0 
Hz, 9.0 Hz, 1 H), 6.89 (ddd, J = 5.9 Hz, 9.0 Hz, 9.0 Hz, 1 H), 5.20 (s, 2 H), 2.27 (m, 
1 H), 1.03-1.22 (m, 4 H); LC/MS (ESI-positive) m/z: 392.2 (M + H*). 
5-(2-Cyclopropylpyrimidin-5-yl)-3-[3-[[ethyl(methyl)sulfamoyl]amino]-2,6- 
difluoro-benzoyl]-1H-pyrrolo[2,3-b]pyridine (PLX7904). To (3-amino-2,6- 
difluorophenyl)(5-(2-cyclopropylpyrimidin-5-yl)-1H-pyrrolo[2,3-b]pyridin-3-yl) 
methanone (53.8 g, 138 mmol) in pyridine (1375 ml) was added ethyl(methyl)sul- 
famoyl chloride (65.0 g, 412 mmol) and the reaction was heated at 65 °C overnight. 
The volatiles were removed under reduced pressure and the residue was par- 
titioned between water and ethyl acetate/tetrahydrofuran. The organic layer was 
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concentrated under reduced pressure to give the crude product, which was dry 
loaded onto silica gel and purified by silica gel column chromatography (twice) 
eluting with 0-10% methanol/dichloromethane, then purified by silica gel column 
chromatography eluting with ethyl acetate. The fractions containing the desired 
product were pooled and concentrated under reduced pressure. The resulting solid 
was triturated with methyl tert-butyl ether and filtered to give the title compound as 
a white solid (21.1 g, 30% yield). 'H NMR (400 MHz, DMSO-d.) 13.07 (br s, 1 H), 
9.71 (br s, 1 H), 9.03 (s, 2 H), 8.76 (s, 1 H), 8.68 (s, 1 H), 8.19 (s, 1 H), 7.59 (ddd, 
J =5.9 Hz, 9.0 Hz, 9.0 Hz, 1 H), 7.27 (dd, J = 9.0 Hz, 9.0 Hz, 1 H), 3.12 (q, J = 7.0 
Hz, 2 H), 2.74 (s, 3 H), 2.29 (m, 1 H), 1.03-1.22 (m, 4 H), 0.95 (t, J = 7.0 Hz, 3 H); 
13C NMR (100 M Hz, DMSO-d6) 181.1, 170.4, 156.1 (dd, Jo = 246 Hz, Jcp = 6.9 
Hz), 155.5, 152.4 (dd, Jz = 250 Hz, Jcp = 8.4 Hz), 149.7, 144.3, 139.2, 128.9, 128.6 
(d, Jor = 9.9 Hz), 127.7, 126.2, 123.0 (dd, Jor = 13.3 Hz, Jcz = 3.4 Hz), 118.5 (dd, 
Jor = 24.6 Hz, Jp = 22.5 Hz), 117.9, 116.2, 112.7 (dd, Jor = 22.5 Hz, Jp = 3.4 Hz), 
45,3, 34.4, 18.2, 13.3, 10.9; LC/MS (ESI-positive) m/z: 513.3 (M + H*). 
(3R)-N-[3-[5-(2-cyclopropylpyrimidin-5-yl)-1H-pyrrolo[2,3-b]pyridine-3- 
carbony]]-2,4-difluoro-phenyl]-3-fluoro-pyrrolidine-1-sulfonamide (PLX8394). 
This material was prepared in a manner analogous to PLX7904 using (3R)-3- 
fluoropyrrolidine-1-sulfony] chloride in place of ethyl(methyl)sulfamoy] chloride. 
The product was purified by reverse-phase HPLC to provide, after lyophilization, 
the title compound as a white solid. "H NMR (400 MHz, DMSO-d6) 13.05 (br s, 
1 H), 9.84 (br s, 1 H), 9.01 (s, 2 H), 8.73 (s, 1 H), 8.67 (s, 1 H), 8.15 (s, 1 H), 7.62 
(ddd, J = 5.9 Hz, 9.0 Hz, 9.0 Hz, 1 H), 7.26 (dd, J = 9.0 Hz, 9.0 Hz, 1 H), 5.29 (dm, 
J = 51.6 Hz (H-F), 1 H), 3.43 (dm, 2 H), 3.33 (m,2 H), 2.27 (m, 1 H), 2.04 (m, 2H), 
1.01-1.11 (m, 4 H); "°C NMR (100 MHz, DMSO-d6) 181.1, 170.4, 156.2 (dd, 
Joe = 247 Hz, Icp = 6.9 Hz), 155.5, 152.6 (dd, Jor = 249 Hz, Jcp = 8.4 Hz), 149.7, 
144.3, 139.2, 128.9, 128.7 (d, Jor = 9.2 Hz), 127.7, 126.2, 122.9 (dd, Jcr = 13.7 Hz, 
Tox = 3.8 Hz), 118.5 (dd, Jor = 24.4 Hz, Jz = 22.2 Hz), 117.9, 116.2, 112.7 (dd, 
Ice = 22.9 Hz, Icr = 3.9 Hz), 93.4 (d, Icr =175 Hz), 54.9 (d, Jor = 22.9 Hz), 46.5, 
32.5 (d, Jcz = 21.3 Hz), 18.2, 10.9; LC/MS (ESI-positive) m/z: 542.9 (M + H‘). 
2-Tert-butyl-5-(2-chloropyrimidin-4-yl)-4-[3-[[ethyl (methyl)sulfamoyl]amino]- 
2-fluoro-phenyl]thiazole. To a solution of 3-[2-tert-butyl-5-(2-chloropyrimidin-4- 
yl) thiazol-4-yl]-2-fluoroaniline (102 mg, 0.281 mmol) in dichloromethane (1 ml) 
was added pyridine (0.5 ml) followed by ethyl(methyl)sulfamoyl chloride (265 mg, 
1.68 mmol). The reaction was allowed to stir at 50°C for 96h. The reaction was 
worked up by extraction with ethyl acetate and 0.1 M HCl (aq). The product was 
purified by flash chromatography (5-30% ethyl acetate in hexanes) which gave 
impure material. This material was again purified by flash chromatography (0.5- 
6% methanol in dichloromethane). This provided the title compound (55 mg, 41% 
yield), which was used in the next step. 'H NMR (400 MHz, CD;CN) 8.45 (d, J = 5.4 
Hz, 1 H), 7.66 (t, J = 7.5 Hz, 1 H), 7.55 (s, 1 H), 7.38 (t, J = 7.5 Hz, 1 H), 7.34 (dd, 
J = 8.0 Hz, 1 H), 7.04 (d, J = 5.4, 1 H), 3.19 (q, J = 7.2 Hz, 2 H), 2.79 (s, 3H), 1.51 (s, 
9 H), 1.09 (t, J = 7.3 Hz, 3 H); LC/MS (ESI-positive) m/z: 484.2 (M + H"). 
5-(2-Aminopyrimidin-4-yl)-2-tert-butyl-4-[3-[[ethyl(methyl)sulfamoyl]amino]- 
2-fluoro-phenyl]thiazole (PLX7922). A solution of 2-tert-butyl-5-(2-chloropyrimi- 
din-4-yl)-4-[3-[[ethyl(methyl)sulfamoyl]amino]-2-fluoro-phenyl]thiazole (51 mg, 
0.11 mmol) dissolved in 5 ml of 7M ammonia in methanol in a sealed reaction vial 
was placed in an oil bath at 80°C and allowed to stir. After 48 h, the reaction was 
concentrated under reduced pressure and the resulting residue was purified by 
reverse-phase HPLC to provide the title compound, after lyophilization, as a white 
solid (31 mg, 61% yield). 'H NMR (400 MHz, DMSO-d6) 9.71 (br s, 1 H), 8.04 
(d, J = 5.1 Hz, 1 H), 7.54 (m, 1 H), 7.30 (m, 2 H), 6.77 (br s, 2 H), 6.03 (d, J = 5.1 Hz, 
1H), 3.06 (q, J = 7.0 Hz, 2 H), 2.67 (s, 3 H), 1.41 (s, 9 H), 0.99 (t,J = 7.0 Hz, 3H); °C 
NMR (100 MHz, DMSO-d6) 181.9, 163.9, 159.3, 158.1, 152.2 (d, Jcp = 251 Hz), 
145.9, 134.7, 127.8, 126.9 (d, Jog = 13 Hz), 126.5, 125.2 (d, Jor =5 Hz), 1243 
(d, Jer = 14 Hz), 105.8, 45.3, 38.1, 34.5, 30.8, 13.2; LC/MS (ESI-positive) m/z: 
465.2 (M + H*). 
N-[3-[(5-chloro-1H-pyrrolo[2,3-b]pyridin-3-yl)-hydroxy-methyl]-2,4-difluoro- 
phenyl] -4-(trifluoromethyl)benzenesulfonamide. To a solution of N-(2,4-difluoro- 
3-formyl-phenyl)-4-(trifluoromethyl)benzenesulfonamide (83.4 g, 0.228 mol) and 
5-chloro-1H-pyrrolo[2,3-b]pyridine (34.8 g, 0.228 mol) in anhydrous methanol 
(350 ml) was added potassium hydroxide (38.4 g, 0.684 mol). The reaction mixture 
was stirred at room temperature, under nitrogen, for 3 h and poured into water (11). 
The product was extracted with ethyl acetate (2 X 800 ml). The organic layers were 
combined, washed with brine (800 ml), dried, and concentrated under reduced 
pressure to yield a brown solid. This solid was suspended in acetonitrile (10 vol) 
overnight with stirring and then cooled in an ice bath for 3h. The solids were 
isolated by filtration, washed with a minimum of cold acetonitrile, and dried to 
provide the title compound (56.8 g, 48% yield). 'H NMR (400 MHz, DMSO-d6) 
11.77 (s, 1 H), 10.38 (s, 1 H), 8.17 (d, J = 2.3 Hz, 1 H), 7.88 (s, 4H), 7.75 (d, J = 2.3 
Hz, 1 H),7.18 (s, 1 H), 7.17 (m, 1 H), 7.05 (t, J = 9.0 Hz, 1 H), 6.20 (d, J = 4.9 Hz, 1 
H), 6.02 (d, J = 4.9 Hz, 1 H); (LC/MS (ESI-positive) m/z: 518.0 (M + H*). 
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N-[3-(5-chloro-1H-pyrrolo[2,3-b]pyridine-3-carbonyl)-2,4-difluoro-pheny]]- 
4-(trifluoromethyl)benzenesulfonamide (PLX5568). To a solution of N-[3-[(5- 
chloro-1H-pyrrolo[2,3-b]pyridin-3-yl)-hydroxy-methyl]-2,4-difluoro-phenyl]-4- 
(trifluoromethyl)benzenesulfonamide (100 g, 0.193 mol) in tetrahydrofuran (2.5 1) 
was added to Dess—Martin periodinane (99.2 g, 0.234 mol) under nitrogen. When 
the reaction was complete, the mixture was poured into 1 M sodium thiosulfate 
(700 ml) and saturated sodium bicarbonate solution (700 ml) and then extracted 
with ethyl acetate (2 X 700 ml). The organic layers were combined, washed with 
brine (800 ml), dried, and concentrated under reduced pressure to yield a brown 
solid. This residue was stirred in ethyl acetate (1 1) and silica (100 g) for 45 min and 
diluted with hexane (500 ml). The mixture was poured through a plug of silica and 
the product was eluted with 50:50 hexane:ethyl acetate. The fractions containing 
the product were combined and concentrated under reduced pressure to yield 
crude product as a yellow solid. Recrystallization of the crude product from eth- 
anol provided the title compound as a pale yellow solid (71 g, 71% yield). 'H NMR 
(400 MHz, DMSO-d6) 13.13 (s, 1 H), 10.51 (s, 1 H), 8.43 (s, 1 H), 8.38 
(d, J = 2.4 Hz, 1 H), 8.18 (d, J= 2.4 Hz, 1 Hy), 7.93 (s, 4 H), 7.44 (m, 1 H), 7.28 
(m, 1 H); °C NMR (100 MHz, DMSO-d6) 180.7, 157.0 (dd, Jcp = 247 Hz, 
Jor =7.3 Hz), 153.5 (dd, Jcp = 251 Hz, Joy = 8.4 Hz), 148.1, 144.0, 143.9, 
139.9, 133.1 (q, Jcp = 32.3 Hz), 130.4 (d, Jop = 9.2 Hz), 128.7, 128.1, 126.9 
(q, Jor = 3.8 Hz), 126.3, 123.8 (q, Jcr = 273 Hz), 121.3 (dd, Joy = 13.8 Hz, 
Jor = 3.8 Hz), 118.6, 118.4 (dd, Jor = 25.4 Hz, Jor = 22.6 Hz), 115.3, 113.1 
(dd, Jcp = 23.1 Hz, Joy = 3.8 Hz); (LC/MS (ESI-positive) m/z: 516.1 (M + H*). 
In vitro and in vivo studies. Biochemical assays and kinome selectivity profiling. 
The in vitro RAF kinase activities were determined by measuring phosphorylation 
of a biotinylated substrate peptide as described previously”. PLX7904 was also 
tested against a panel of 287 kinases at concentrations of 11M in duplicate. 
Kinases inhibited by over 50% were followed up by ICs9 determination. The 
287 kinases represent all major branches of the kinome phylogenetic tree. The 
inhibition screen of 287 kinases was performed under contract as complementary 
panels at Invitrogen (Life Technologies) SelectScreen profiling service, DiscoverX 
KINOMEScan service, and Reaction Biology Corporation Kinase HotSpot service. 
Cell culture experiments. The B9 cell line was a gift from A. Balmain. The SK-MEL- 
239 and SK-MEL-239-C3 cell lines were provided by D. Solit and N. Rosen. The 
IPC-298 cell line was purchased from DSMZ. All other cell lines (A375, A431, 
COLO829, HCT116, and SKBR3) were purchased from ATCC. All cell lines were 
authenticated at the source by STR profiling and tested negative for mycoplasma 
contamination before use. Compounds dilutions were done in 100% DMSO and 
these titrations were diluted 500-fold in culture medium when added to cells, 
resulting in a final 0.2% DMSO concentration. Final compound concentrations 
are listed in text and figures. 

Phospho-ERK AlphaScreen assay. To determine the effects of compound treat- 
ment on phosphorylation of ERK1/2, cells were plated in a 96-well plate and 
treated with an eight-point titration of compound for 1 h at 37 °C before lysis. To 
detect pERK, cell lysates were incubated with streptavidin-coated AlphaScreen 
donor beads, anti-mouse IgG AlphaScreen acceptor beads, a biotinylated anti- 
ERK1/2 rabbit antibody, and a mouse antibody that recognized ERK1/2 only 
when it was phosphorylated on Thr202 and Tyr204. The biotinylated ERK1/2 
antibody bound both to the streptavidin-coated AlphaScreen donor beads and 
to ERK1/2 (regardless of its phosphorylation state), and the phospho-ERK1/2 
antibody bound to the acceptor beads and to ERK1/2 that was phosphorylated at 
Thr202/Tyr204. An increase in ERK1/2 phosphorylation at Thr202/Tyr204 
brought the donor and acceptor AlphaScreen beads into close proximity, gen- 
erating a signal that could be quantified on an EnVision reader (Perkin Elmer). 
Inhibition of ERK phosphorylation resulted in a loss of signal compared with 
DMSO controls. 

Phospho-ERK immunoblot analysis. Western blots were performed by standard 
techniques and analysed on an Odyssey Infrared Scanner (Li-COR Biosciences). 
The following antibodies were used: pERK1/2 (T202/Y204) and ERK1/2 (Cell 
Signaling). 

Growth inhibition assay. Cells were plated into a 96-well plate at a density of 
3,000 cells per well and allowed to adhere overnight. Compounds were dissolved 
in DMSO, diluted threefold to create an eight-point titration, and added to cells. 
After incubation for 72h, cell viability was examined using CellTiter-Glo 
(Promega). 

Anchorage-independent growth assay. Twenty-five thousand B9 cells were plated 
in each well of a six-well plate with a bottom layer of 1% anda top layer of 0.4% low 
melting agar (Sigma A4018) containing RPMI1640 medium with 10% FBS. For the 
RAF inhibitor study, B9 cells grown in soft agar were treated with vemurafenib, 
PLX4720 or PLX7904 at the indicated concentrations, or DMSO at 0.2% final 
concentration for 3 weeks. For the EGFR ligand study, B9 cells grown in soft agar 
were treated with AREG (R&D Systems 989-AR), TGF-o (R&D Systems 239-A), 


or HB-EGF (R&D Systems 259-HE) at the indicated concentrations for 3 weeks. 
For the vemurafinib and erlotinib combination study, B9 cells grown in soft agar 
were treated with vemurafenib, erlotinib, or a combination of the two compounds 
at the indicated concentrations, or DMSO for 3 weeks. Anchorage-independent 
colonies = 100 um were scored using AxioVision Rel 4.8 software (Carl Zeiss). 
ELISA for detecting EGFR ligands. Twenty thousand B9 cells were plated in each 
well of a 96-well plate and treated with DMSO control or compounds at the 
indicated concentrations for 48 h. Cell supernatants were collected and cells were 
lysed using 1 X cell lysis buffer (CST 9803). The amounts of AREG, TGF-«, and 
HB-EGF in cell supernatants or cell lysates were determined with the use of ELISA 
Development kits (R&D Systems DY989, DY239, and 259-HE-050N) according to 
the manufacturer’s instructions. 

EGER signalling assay. B9 cells were treated with 1 1M or 5 uM vemurafinib or 
control vehicle for the indicated times in the absence of serum. Supernatants from 
treated B9 cells were then collected and added to newly plated, serum-starved 
(overnight) B9 cells for 10 min. Cells were washed with 1 < PBS twice, lysed, and 
subjected to western blot analysis. pEGFR Y1068, EGFR, pAkt $473, and Akt 
antibodies were purchased from Cell Signaling Technology. 

RAF dimerization assays. BRAF-CRAF heterodimerization was characterized in cell 
lysates. Cells were plated on 96-well dishes and allowed to adhere overnight at 37 °C. 
Cells were treated with compound or DMSO for 1h at 37 °C before lysis in RIPA 
buffer containing protease and phosphatase inhibitors. The lysates were transferred 
to ELISA plates coated with a monoclonal CRAF capture antibody, and incubated 
overnight at 4°C. Further incubations with a polyclonal BRAF detection antibody 
and a horseradish-peroxidase-labelled secondary antibody were done at room tem- 
perature. After incubation with TMB substrate and sulfuric acid, the signal was 
analysed by measuring absorbance at 450 nm on a Tecan Safire plate reader. 
Microarray gene expression analysis. B9 cells were plated in 1 1M vemurafenib, 
1 uM PLX7904 or 0.2% DMSO vehicle control and incubated for 17 h. Cells were 
harvested, total RNA was isolated (RNeasy Mini Kit, Qiagen), and gene expression 
was measured using Affymetrix Mouse420_2 chips following the manufacturer’s 
instructions. Vemurafenib response genes were identified by requiring the ratio 
between the treated and vehicle control samples be more than 1.9 (upregulated) or 
less than 0.54 (downregulated). 

Tumour xenograft studies. All animal studies were conducted in accordance with 
the Institute for Laboratory Animal Research Guide for the Care and Use of 
Laboratory Animals and the US Department of Agriculture’s Animal Welfare 
Act and approved by the institutional review board at testing facilities. Sample 
size (number of mice per group) was selected to provide at least 80% power to 
detect a two s.d. difference of mean tumour volume between two groups with 
two-sided type I error = 1%. The same formulation was used for both COLO205 
and B9 xenograft studies. The powder of the test compound was dissolved in 
pure N-methyl-2-pyrrolidone. Diluent consisted of PEG400:TPGS:Poloxamer 
407:water (40:5:5:50). Before gavage administration, fresh stock of N-methyl-2- 
pyrrolidone compound solution (or N-methyl-2-pyrrolidone for vehicle) was 
thoroughly mixed with the diluents to make a uniform suspension. Dosing 
volume was 5 pilg'. On the last day of the efficacy study, blood samples were 
collected at 0, 2, 4, and 8h after last dosing, two animals per time point, for 
pharmacokinetic analysis. Animals were fed a standard rodent diet and water 
was supplied ad libitum. Tumour measurements were taken with an electronic 
microcalliper three times weekly. In addition, body weights were recorded at 
these times. Test facility investigators were blinded to the group allocation 
during the experiment. 

COLO205 tumour cells were cultured in DMEM 10% FBS 1% penicillin/strep- 
tomycin supplemented with bovine insulin, at 37°C. Balb/C nude mice, female, 
6-8 weeks old, weighing approximately 18-22 g, were inoculated subcutaneously 
at the right flank with COLO205 tumour cells (5 X 10°) in 0.1 ml of PBS mixed 
with matrigel (50:50) for tumour development. The treatment was started when 
mean tumour size reached approximately 100mm’, with eight mice in each 
treatment group randomized to balance the average weight and tumour size. B9 
cells were expanded in DMEM 10% FBS 1% penicillin/streptomycin. Upon tryp- 
sinization the cells were washed three times with 20 ml RPMI, and after the final 
centrifugation were re-suspended, counted, and adjusted by volume to a final 
concentration of 5 X 10’ cells per millilitre. B9 xenografts were started by injection 
of 5 X 10° cells subcutaneously in 6- to 7-week-old female nude Balb/c mice. 
Compound dosing started when the average size of tumours reached 50- 
70mm*. Animals were equally distributed over treatment groups (n= 10) to 
balance the average tumour size and body weight. Animals were dosed orally 
for days 1-14 twice daily and days 15-28 once daily with vehicle, vemurafenib 
50 mg per kg, or PLX7904 50 mg per kg. 12-O-tetradecanoylphorbol-13-acetate 
(TPA) was put on the skin of all mice twice a week during weeks 3 and 4 at a dose 
of 2 1g in 200 pl acetone. 
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Crystallization and structure determination. Expression and purification of BRAF 
and BRAFY” were performed as previously described'*”*. Crystallization drops 
were prepared by mixing the protein solution with 1 mM of compound and the same 
amount of reservoir, and drops were incubated by vapour diffusion (sitting drops) at 
4°C. The mother liquor used to obtain co-crystals of PLX7904, dabrafenib, and 
PLX7922 with BRAF’™° consisted of 0.1 M BisTris at PH 6.0, 12.5% 2,5-hexane- 
diol, 12% PEG3350; the reservoir used to obtain co-crystals of PLX5568 with 
BRAF™” contained 0.1 M MES at pH 6.0, 35% (v/v) 2-methyl-2,4- pentanediol, 
and 0.2 M Li,SO,. All co-crystals were flash-frozen with liquid nitrogen, but 
BRAFY° co-crystals were soaked in a solution containing the mother liquor plus 
20% glycerol, before flash-freezing. X-ray diffraction data were collected at beamline 
8.3.1 at the Advanced Light Source (Lawrence Berkeley Laboratory) and beamline 
9.1 at Stanford Synchrotron Radiation Lightsource (Stanford University). Data were 
processed and scaled using MOSFLM"'! and SCALA in the CCP4 package”. All co- 
structures were solved using molecular replacement with the program MOLREP”’. 
The starting models used were the inhibitor-bound BRAPY °°! and BRAFP™?, 
respectively (Protein Data Bank accession numbers 4FK3 and 1UW)). The final 
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models were obtained after several rounds of manual rebuilding and refinement 
with PHENIX™ and REFMAC*’. A summary of the crystallography statistics is 
included in Extended Data Table 1. 
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Extended Data Figure 1 | Differential effects of PLX7904 and vemurafenib —_ curves showing that PLX7904 inhibits pERK at high concentrations in three 
on MAPK signalling. PLX7904 (black) and vemurafenib (red) show similar RAS mutant cell lines, with apparent ICso (ICs9"P?) values in the 100 UM range. 


potency to block pERK signalling in human BRAFY® melanoma cell Therefore, paradox breakers are not expected to affect the MAPK pathway in 
COLO829 (a); but in RAS activated human melanoma cell line IPC-298 normal tissues (either paradoxical activation or inhibition) at therapeutic 
(NRAS°°!") (b) and human colorectal carcinoma cell line HCT116 concentrations. The pERK curves were generated using an AlphaScreen assay. 


(KRAS@”) (c), vemurafenib paradoxically activates MAPK signalling while | Mean + s.d., n = 5 independent experiments. 
PLX7904 causes negligible pERK increase. d, Expanded view of the pERK 
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Extended Data Figure 2 | Gene expression analysis of B9 cells in response to —_ Fos, and Egr1 were upregulated by vemurafenib. The corresponding human 
either vemurafenib or PLX7904 treatment (both at 1 1M concentration). genes are known to be suppressed by vemurafenib in BRAF’®” mutant 
a, b, Hierarchical clustering of the 236 Affymetrix mouse gene probes (see human melanoma”. Opposing changes in expression were also observed with 


Supplementary Table 2 for a complete list) that were upregulated (a) or 


the Id2 gene. c, Changes in the messenger RNA levels of four EGFR ligands 


downregulated (b) by vemurafenib (233 probes) or PLX7904 (4 probes). The — (amphiregulin, HB-EGF, TGF-«, and epiregulin) along with EGFR itself in B9 
single overlap, Cyp1b1, and four representative MAPK pathway-responsive cells treated with vemurafenib or PLX7904. All four EGFR ligands abundantly 


genes as well as three genes that encode EGFR ligands are marked. Two 


expressed in B9 cells were induced by vemurafenib, but the expression of EGFR 


independent experiments are shown. MAPK pathway response genes Spry2, and other ERBB family members remained unchanged. 
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Extended Data Figure 3 | EGFR ligands may mediate vemurafenib-induced | TGF-« (e), and HB-EGF (f) promote the anchorage-independent growth of 
cuSCC. ELISA assays demonstrate increased levels of amphiregulin (a) and B9 cells. B9 cells grown in soft agar were treated with EGFR ligands at the 
TGF-o (b) proteins in the supernatants and HB-EGF (c) in the cell lysates of B9 —_ indicated concentrations for 3 weeks. Error bars, s.d. (a-c), s.e.m. (d-f); n =5 
cells after vemurafenib treatment for 48 h. PLX7904 does not induce the (a-c) and 6 (d-f) independent experiments. 

expression of EGFR ligands. Like vemurafenib, exogenous amphiregulin (d), 
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Extended Data Figure 4 | Effect of BRAF inhibitors on EGER signalling. 
a, EGER signalling measured by levels of phosphorylated EGFR and AKT after 
a brief (10 min) exposure of serum-starved B9 cells to supernatant collected 
from B9 cells treated with vemurafenib for the indicated time. b, Pre-treatment 
with EGFR inhibitor erlotinib (ERL) inhibited EGFR signalling induced by 
supernatants from vemurafenib (VEM)-treated B9 cells. Serum-starved B9 cells 
were pre-treated with 3 1M erlotinib before starting a 10 min exposure to the 


Vemurafenib (pM) 


Colony Counts 


Vemurafenib - 1yM 1yM 
Erlotinib - - 


1yuM 1uM 
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supernatants. Supernatants were collected from B9 cells treated with 
vemurafenib or PLX7904 for 3 days. c, Erlotinib inhibits the soft agar colony 
forming capacity of vemurafenib in B9 cells. Panels a and b are representative 
of results from three independent experiments. Error bars in ¢, s.e.m.; 1 = 6 
independent experiments. Full scans of western blot data are presented in 
Supplementary Figure 1. 
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Extended Data Figure 5 | Comparison of inhibitor-bound BRAF structures. _ helix in response to different inhibitors. From sorafenib to PLX5568 to 

a, Perfect alignment between vemurafenib and PLX7904-bound BRAF vemurafenib, the degree of outward shift correlates with increasing ERK 
structures (backbone root mean squared deviation 0.22 A). b, An overlay ofthe _ pathway inhibition index (Table 1). d, Close-up view showing the Leu505 side- 
structures of BRAF bound to four inhibitors: sorafenib, PLX5568, vemurafenib, chain position in the four structures. PLX7904 pushes the tip of Leu505 side- 
and PLX7904 (colour schedule same as c). ¢, Outward movement of aC chain away by 1 A from its position in the vemurafenib-bound structure. 
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Extended Data Figure 6 | The regulatory spine (R-spine) in BRAF. 

a, b, R-spine refers to four conserved hydrophobic residues that form a column 
in the active state of a kinase, and the distortion or disassembly of the spine 
marks the transition to an inactive state”. The term was introduced using PKA 
as the template, and the four residues that compose the R-spine included Leu95, 
Leu106, Tyr164, and Phe185 (PKA numbering). The corresponding residues in 
BRAF are Leu505, Phe516, His574, and Phe595 (rendered here in spheres). 
Tyr164 of PKA, which is a histidine (for example, His574 in BRAF) in most 
other kinases, forms hydrogen bonds with the backbone of the DFG motif and 
packs against the side chain of DFG Phe185 (the corresponding residue in 


BRAF is Phe595). In the BRAF structure, Leu567 from aE helix also makes 
direct hydrophobic contacts with Phe595, an interaction that is conserved 
across the kinome. Leu567, Phe595, Leu505, along with another hydrophobic 
residue [e527 that packs against Leu505, form a column (dubbed here as 
R-spine’) with an axis tilted 45° from that of the R-spine. Analyses of published 
kinase structures show that all four R-spine’ residues could be involved in 
kinase inhibitor binding whereas the two outer residues of R-spine rarely make 
direct contacts with inhibitors. Therefore, R-spine’ is more relevant for 
studying inhibitor-induced conformational change in kinases. 
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Extended Data Figure 7 | Vemurafenib-resistant cells remain relatively 
sensitive to paradox breakers. a, pMEK and b, growth ICs, curves 
(mean = s.d., n = 5 experiments) for vemurafenib and PLX7904 in the 


SKMEL-239 parental cell line and a representative vemurafenib-resistant clone 
(C3) that expresses a spliced variant of BRAF’°°°” promoting dimerization. 
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Extended Data Table 1 
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* Data for each structure is collected from a single crystal. 
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Intercellular wiring enables electron transfer 
between methanotrophic archaea and bacteria 


Gunter Wegener'*, Viola Krukenberg!*, Dietmar Riedel*, Halina E. Tegetmeyer*° & Antje Boetius'* 


The anaerobic oxidation of methane (AOM) with sulfate controls 
the emission of the greenhouse gas methane from the ocean floor’. 
In marine sediments, AOM is performed by dual-species consortia 
of anaerobic methanotrophic archaea (ANME) and _sulfate- 
reducing bacteria (SRB) inhabiting the methane-sulfate transition 
zone’. The biochemical pathways and biological adaptations 
enabling this globally relevant process are not fully understood. 
Here we study the syntrophic interaction in thermophilic AOM 
(TAOM) between ANME-1 archaea and their consortium partner 
SRB HotSeep-1 (ref. 6) at 60 °C to test the hypothesis of a direct 
interspecies exchange of electrons”*. The activity of TAOM con- 
sortia was compared to the first ANME-free culture of an AOM 
partner bacterium that grows using hydrogen as the sole electron 
donor. The thermophilic ANME-1 do not produce sufficient 
hydrogen to sustain the observed growth of the HotSeep-1 partner. 
Enhancing the growth of the HotSeep-1 partner by hydrogen addi- 
tion represses methane oxidation and the metabolic activity of 
ANME-1. Further supporting the hypothesis of direct electron 
transfer between the partners, we observe that under TAOM con- 
ditions, both ANME and the HotSeep-1 bacteria overexpress genes 
for extracellular cytochrome production and form cell-to-cell con- 
nections that resemble the nanowire structures responsible for 
interspecies electron transfer between syntrophic consortia of 
Geobacter’. HotSeep-1 highly expresses genes for pili production 
only during consortial growth using methane, and the nanowire- 
like structures are absent in HotSeep-1 cells isolated with hydro- 
gen. These observations suggest that direct electron transfer is a 
principal mechanism in TAOM, which may also explain the enig- 
matic functioning and specificity of other methanotrophic 
ANME-SRB consortia. 

The anaerobic oxidation of methane with sulfate (AOM) controls 
the emission of methane from the seabed’**. At environmental con- 
ditions the net reaction CH,(aq) + $0.7" + HS +HCO, +H,0 
allows an energy yield of only —20 to —40 kJ per mol of methane 
oxidized, shared between the two partner organisms. Generally, AOM 
consortia show exceptionally slow growth with generation times >2 
months, which has so far impeded their cultivation®"’. Sulfate-coupled 
AOM in marine habitats is performed by members of three different 
ANME clades (ANME-1, -2 and -3), which associate physically with 
specific partner bacteria of the Desulfosarcina/Desulfococcus or the 
Desulfobulbus cluster’, indicating an obligate functional role of 
the SRB in AOM. Early studies had already suggested a syntrophic 
coupling of both partners via a transfer of reducing equivalents*”™’, 
yet the underlying mechanisms remain unknown. Biochemically, 
the anaerobic oxidation of methane appears in the ANME and 
involves a reversal of the enzymatic machinery of the methanogenesis 
pathway'’*'*. However, reversing an energy-yielding process is per se 
endergonic, and hence AOM requires an efficient transfer of reducing 
equivalents from methane to sulfate, so that the ANME can gain 


energy by AOM"**. Previous results indicate that the partner 
SRB*”° act as electron sinks of AOM, but recently members of the 
ANME-2 clade were also suggested to perform incomplete sulfate 
reduction by an as yet unknown pathway”’. 

In this study we focus on the hypothesis of syntrophic growth in 
thermophilic AOM consortia by direct interspecies electron transfer, 
and test this and alternative hypotheses (mechanisms illustrated 
in Extended Data Fig. 1). The studied sediment-free TAOM enrich- 
ment was cultivated at 60°C and supplied with 28 mM sulfate and 
0.2 MPa methane, allowing an energy yield (AGg) of —34kJ mol ', 
and resulting in doubling times of approximately 68 days (Fig. la) and 
growth efficiencies of 2% (see Methods). The culture is dominated by 
consortia of ANME-1 and HotSeep-1 appearing in an approximate 
1:1 stoichiometry. Owing to their larger size ANME account for 
around 75% of the consortial biomass (Fig. 1b and Extended Data 
Table 1). Using a dilution-to-extinction approach (1:10 to 1:10°) with 
hydrogen (0.2 MPa) and sulfate (28 mM), we were able to separate a 
strain of HotSeep-1 that was identical to the partner bacterium of the 
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Figure 1 | Activity of the TAOM consortia in culture. a, The exponential 
increase of sulfide production translates to a doubling time of 68 days 
(biological replicates n = 4). b, Representative fluorescence micrograph of 
TAOM consortia from a; red, ANME-1; green, HotSeep-1. Scale bar, 10 um. 
Representative of 14 similar images recorded. c, Sulfide production under 
TAOM conditions (red circles, 0.07 mM sulfide per day) versus a control (white 
squares, 0.02 mM per day). Hydrogen (blue triangles), or hydrogen plus 
methane (purple stars) increased sulfide production (both 0.55 mM per day; 
biological replicates n = 3, symbols represent mean values, error bars are s.d.). 
d, Per cent of total RNA reads mapped to ANME-1 (red) and HotSeep-1 
(green) after 3 days of incubation (c), biological replicates n = 3, data are mean 
values, for statistical analyses see Supplementary Table 1. 
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TAOM consortium (>99% identity in 16S ribosomal RNA gene and 
internal transcribed spacer region, and (genomic) average nucleotide 
identity >99%, Extended Data Table 2). This strain grows without 
ANME-1 as single cells or in mono-species aggregates (Extended 
Data Fig. 2a) and with a single contaminant, Archaeoglobus sp. (1-5% 
of all cells), which does not form consortia with HotSeep-1. Substrate 
tests with the HotSeep-1 culture showed that it is an obligate chemo- 
lithoautotroph, with hydrogen and sulfate as the sole molecular redox 
couple and doubling times of 4 to 6 days (see Methods and Extended 
Data Fig. 2b, c). Although the supplied hydrogen (0.2 MPa) could 
provide a tenfold higher energy yield to HotSeep-1 than syntrophy in 
TAOM (AGg = —151kJ mol ' versus —17kJ mol‘, the latter being 
half of the net energy yield of TAOM consortia), its carbon assimilation 
efficiency remained similarly low (approximately 1.5% of converted 
reducing equivalents). 

We compared the activity of the TAOM consortia and the hydro- 
genotrophic HotSeep-1 by physiological experiments combined with 
metagenomic and metatranscriptomic analyses and electron micro- 
scopy of the involved organisms. A classical experiment for the study 
of syntrophy in dual-species consortia is the addition of potential 
intermediates that could be theoretically produced by the ANME as 
a by-product of methane oxidation, and consumed as the electron 
donor by the partner SRB (mechanisms illustrated in Extended 
Data Fig. 1). If these compounds were relevant in interspecies transfer 
of reducing equivalents, their addition to the medium should favour 
the electron-accepting partner, and should repress the electron trans- 
fer between the consortial partners'**~’. By contrast, a model of direct 
electron transfer via nanowires as proposed in refs 7,8,10, would be 
insensitive to such external additions of potential intermediates if the 
additions do not represent an alternative, preferred substrate for one 
of the partner organisms. 

With the exception of hydrogen, none of the potential intermediates 
added as sole electron donor caused significant microbial sulfide pro- 
duction in the TAOM enrichment (Extended Data Table 3). Carbon 
monoxide and methyl sulfide even inhibited sulfide production when 
added together with methane. Carbon monoxide is known to inhibit 
cytochrome c activity, which may play an important role in intra- and 
intercellular transfer of reducing equivalents in AOM'’. Methylated 
substrates may interfere with the reverse, oxidative operation of the 
methanogenesis pathway”. The addition of colloidal zero-valent sulfur 
to the TAOM culture (supplied in concentrations from 1 to 25 mM, 
Extended Data Fig. 3a) did not result in the production of sulfide and 
sulfate as reported in a previous study with ANME2a/DSS consortia”’. 
However, with hydrogen as an electron donor (0.16 MPa), sulfide 
production rates increased three- to eightfold compared to replicate 
incubations with methane as the sole electron donor at TAOM con- 
ditions (Fig. 1c and Extended Data Table 3). We investigated further 
the influence of hydrogen on the oxidation of methane using head- 
space-free incubations (Extended Data Fig. 4). In incubations with 
methane and hydrogen, hydrogen was first selectively consumed 
and methane oxidation was repressed. When hydrogen was con- 
sumed, methane oxidation rates recovered to the same level as in 
replicate incubations with only methane, suggesting an inhibition of 
methane oxidation in the presence of hydrogen. To investigate the 
influence of hydrogen on the consortial partners, we mapped total 
RNA reads to the genome drafts of ANME-1 and HotSeep-1 after 
exposure to different substrate conditions (Fig. 1d, for read numbers 
see Supplementary Table 1). Under TAOM conditions, relative RNA 
expression patterns reflected the biomass ratio between the ANME 
and their partner bacteria (3:1) (Fig. 1d). The addition of hydrogen 
caused a strong relative increase of HotSeep-1 over ANME gene 
expression, even in the presence of methane. This indicates that if 
the partner SRB does not act as an electron sink for reverse methano- 
genesis, ANME activity is repressed; an effect of syntrophic coopera- 
tion that was predicted previously”. 


588 | NATURE | VOL 526 | 22 OCTOBER 2015 


To test the hypothesis of hydrogen production by ANME as a direct 
intermediate in TAOM (Extended Data Fig. 1) that is consumed by 
HotSeep-1, we assessed the presence and production of hydrogen 
under TAOM conditions. Maximal hydrogen concentrations were 
only about 2Pa in the TAOM enrichments, and re-established 
within 7h after gas phase exchange (Fig. 2a). Thermodynamically, 
HotSeep-1 could thrive on these low hydrogen concentrations with 
an energy yield of approximately —24 kJ mol _'. However, the produc- 
tion of hydrogen in TAOM cultures corresponded to only ~0.5% of 
the theoretical hydrogen production rates as reflected by sulfide pro- 
duction (according to the stoichiometry of reverse methanogenesis; 
Fig. 2b). This is insufficient to explain the consortial growth of 
HotSeep-1. Furthermore, we could not detect catalytic subunits of 
[FeFe] or [NiFe] hydrogenases in the ANME-1 draft genome. In con- 
clusion, hydrogen appears to be an alternative growth substrate for 
HotSeep-1 when available externally, but is not provided by ANME-1 
as an intermediate in TAOM. 

An alternative explanation of the TAOM interaction could be 
direct interspecies electron transfer (DIET) between ANME-1 and 
HotSeep-1, also hypothesized as a principle mechanism for syn- 
trophic growth of AOM consortia*’®'””°, A switch from interspecies 
hydrogen transfer to DIET has been previously shown for the dual- 
species interaction between Geobacter sulfurreducens and Geobacter 
metallireducens, benefiting both consortial partners, as evidenced by 
their increased growth rates via DIET'’*. In the tightly packed 
Geobacter consortia, a dense network of cell-to-cell connections was 
detected by transmission electron microscopy and immunogold label- 
ling, probably serving in electron transfer'®. The functioning of elec- 
tron transfer via conductive cell-to-cell connections (nanowires) is 
not fully understood, but apparently involves the expression of the 
pilin protein PilA of the type IV pili together with certain members of 
the cytochrome c family”’°’’*. Recent findings of such cytochromes 
in the ANME genome, along with redox-dependent staining of the 
intercellular matrix of the ANME-2/SRB consortia®, suggest that 
DIET could also be relevant in AOM. 

To find evidence for DIET in TAOM we analysed the genome and 
specific gene expression of ANME-1 and HotSeep-1, with the focus on 
similarities to the Geobacter consortia using DIET as the main electron 
transfer mechanism. The ANME-1 draft genome contains several 
potentially extracellular multi-haem cytochrome c proteins, some of 
which are highly expressed during TAOM, but no genes for pili forma- 
tion (Extended Data Table 4, Supplementary Table 2). However, 
HotSeep-1 comprises the genes for the biosynthesis and assembly of 
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Figure 2 | Hydrogen in TAOM cultures. a, Hydrogen gas pressure under 
TAOM conditions (methane, 0.2 MPa; sulfate, 28 mM; 5-day incubation (filled 
circle)). The dashed arrow depicts gas phase exchange with methane. Open 
circles show equilibration of hydrogen in the headspace (n = 1). b, Hydrogen 
production in 10 ml TAOM culture supplied with 0.2 MPa methane after 
headspace exchange and addition of 10 mM molybdate (final concentration) to 
inhibit hydrogen consumption. Open circles are replicate measurements with 
hydrogen production of 2 and 3nmoll ' min’. Dotted line is predicted 
hydrogen production for reverse methanogenesis (CH, + 2H,0 > 

CO, + 4H>) = 420 nmol H2 1”! min’ culture, for an observed sulfide 
production rate of 104nmol1~' min~’. Both experiments were repeated once 
with the same results. 
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Figure 3 | Expression of genes and visualization of structures attributed 
to interspecies electron transfer in TAOM. a, Expression of archaeal (red) 
and bacterial (green) genes in incubations with hydrogen (darker shade), or 
hydrogen plus methane (lighter shade), relative to methane alone (expression 
under TAOM condition = 1; biological replicates n = 3; *P < 0.05; for statistics 
see Supplementary Table 5b). b-f, Micrographs of thin sections. Scale bars, 
300 nm. b-d, TAOM consortia with HotSeep-1 cells (H; rod-shaped; 
approximately 1 X 0.5 um) and ANME-1 cells (A; cylindrical shape with 
envelope”; 1.5 X 0.8 tm). Nanowires of 10 nm diameter and up to several 
thousand nanometres in length connect both species, representative of 

70 recorded images. c, d, Arrows mark the apparent origin of the wires from 
the membrane of HotSeep-1 bacteria to the polar sites of ANME-1 cells. 

d, HotSeep-1 cell with nanowires crossing the cell membrane (marked by the 
arrow). e, f, Aggregated HotSeep-1 cells grown with hydrogen do not 
develop nanowires; representative of 48 images. 


type IV pili, as well as large multi-haem cytochrome c proteins, both 
with high amino acid similarity to respective proteins in Geobacter 
spp.” (Extended Data Table 4 and Supplementary Tables 3 and 4). 
We further investigated expression patterns of these potentially 
DIET-related genes in comparison to genes for AOM (mcrA) and 
sulfate reduction (dsrA), representing key catabolic processes in 
ANME-1 and HotSeep-1 (Fig. 3a, for statistical analyses see 
Supplementary Table 5). In agreement with the results from total 
RNA expression (Fig. 1d), a switch from methane to hydrogen (or 
methane plus hydrogen) as an energy source caused an immediate 
drop in mcrA and cytochrome expression in ANME, as well as a 
reduction of the expression of the HotSeep-1 pili and cytochromes. 
Comparing relative gene expression of HotSeep-1 in consortial growth 
using methane, versus single growth using hydrogen, both pilA and 
cytochrome c are clearly overexpressed under TAOM conditions, this 
is also the case when compared to dsrA expression (Extended Data 
Fig. 5, for statistical analyses see Supplementary Table 5). 

This observation is supported independently by transmission elec- 
tron microscopy on thin sections of TAOM consortia. Using two 
different embedding techniques we found a dense network of pili-like 
structures connecting HotSeep-1 to ANME-1 cells (Fig. 3b, c and 
Extended Data Fig. 6a), resembling the nanowire structures found in 
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Geobacter consortia (visualized with the same methods, see Extended 
Data Fig. 6b). In TAOM consortia the nanowires are larger, and appear 
more dense, with diameters of approximately 10nm and apparent 
lengths of 100 to >1,000 nm. In agreement with the genomic patterns, 
these wires seem to be formed by the partner bacteria, connecting to the 
ANME-1 at their polar sides (Fig. 3b-d). In contrast, HotSeep-1 cells in 
mono-species aggregates isolated with hydrogen show smooth surfaces 
without such extracellular extensions (Fig. 3e, f), indicating that the 
observed intercellular structures are specific to consortial growth under 
TAOM conditions and not only related to cellular attachment. 

In conclusion, our data show that consortial growth of thermophilic 
ANME-1 archaea and HotSeep-1 bacteria is probably based on similar 
principles as those in Geobacter consortia, where DIET is mediated 
by intercellular wiring made up of pili-like structures and outer- 
membrane multi-haem cytochromes. The underlying biophysics and 
biochemistry of intercellular wiring for direct electron transfer needs 
further investigation. If this mode of syntrophic cooperation between 
the electron-generating archaea and nanowire-producing bacteria is 
also the underlying mechanism for other types of AOM consortia as 
suggested recently*, it may explain the enigmatic specificity of dual- 
species partnerships in AOM in general. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cultivation of TAOM consortia. Sediment-free, TAOM enrichment cultures 
were obtained after 1.5 years by semi-continuous incubation of hydrothermal vent 
sediments from Guaymas Basin with sulfate reducer medium”! and 0.225 MPa 
CH, (+0.025 MPa CO3) as the sole energy source at 60 °C, as described in ref. 6. 
Culture medium was replaced and samples were diluted 1:2 when sulfide concen- 
trations exceeded 12 mM. For the different experiments, subsamples of the main 
culture (biological replicates) were incubated in parallel. 

DNA extraction, sequencing and phylogenetic classification of TAOM part- 
ners. Genomic DNA was extracted as described previously from an active 
TAOM culture. The protocol encompassed three cycles of freezing and thawing, 
chemical lysis in a high-salt extraction buffer (1.5 M NaCl) by heating of the 
suspension in the presence of sodium dodecyl sulfate and hexadecyltrimethylam- 
monium bromide, and treatment with proteinase K. To amplify bacterial 16S 
ribosomal DNA genes the primer pair GM3/GM4 (ref. 33) was used. For archaeal 
16S rDNA genes the primers 20F (ref. 34) and Arc1492R (ref. 35) were selected. 
PCR reactions were performed according to ref. 6. The phylogenetic affiliation was 
inferred with the ARB software package’® and release 115 of the ARB SILVA 
database*’. Representative 16S rRNA gene sequences are deposited at NCBI with 
the accession numbers KT152859-KT152885. 

Visualization of TAOM aggregates by fluorescence in situ hybridization. Cell 
aliquots were fixed in 2% formaldehyde for 2 h at room temperature, washed with 
1X PBS (pH 7.4). Fixed cell suspensions were treated with mild sonication 
(Sonoplus HD70; Bandelin) and aliquots of 50-250 ul were filtered onto GTTP 
filter (0.2m pore size, 20mm diameter). CARD-FISH was performed as 
described previously’ with the following modifications: for cell wall permeabili- 
zation, filters were sequentially incubated in lysozyme solution (10 mg ml lyso- 
zyme powder, 0.1 M Tris-HCl, 0.05 M EDTA, pH 8) for 15-30 min at 37 °C and 
proteinase K solution (0.45mU ml‘, 0.1M Tris-HCl, 0.05M EDTA, pH 8, 
0.5 M NaCl) for 2 min at room temperature. Endogenous peroxidases were inac- 
tivated by incubating the filters in 0.15% HO, in methanol (30 min, room 
temperature). The oligonucleotide probes ANME-1-350 and HotSeep-1-590 were 
applied with formamide concentrations according to ref. 6. For dual CARD-FISH, 
peroxidases of the first hybridization were inactivated by 0.3% HO, in methanol 
(30 min, room temperature). Catalysed reporter deposition was combined with 
the fluorochromes Alexa Fluor 488 and Alexa Fluor 594. Filters were stained with 
DAPI (4,6-diamidino-2-phenylindole). Micrographs were obtained by confocal 
laser scanning microscopy (LSM 780; Zeiss). 

Test of potential AOM intermediates/alternative HotSeep-1 substrates. All 
experiments were performed with artificial seawater medium containing 
30mM of carbonate buffer at TAOM cultivation temperature (60°C), except 
when specified otherwise. To ensure equilibration of gas phases, samples were 
agitated on shaking tables. Highly pure gases and chemicals were used as additions 
to the incubations. Standard TAOM conditions are defined here as 0.2 MPa meth- 
ane and 28mM sulfate. To test the TAOM enrichment for substrate-specific 
sulfide production, triplicate culture aliquots (10 ml in 20 ml Hungate tubes) were 
supplemented with different substrates (Extended Data Table 2) at concentrations 
of 20mM, except methyl sulfide and carbon monoxide (both 0.05 MPa), and 
hydrogen (0.16 MPa) with and without methane (0.2 MPa). Zero-valent sulfur 
was prepared according to ref. 39 and was supplied as dissolved species. For this 
compound we additionally tested sulfide development via disproportionation in a 
concentration gradient from 1-12 mM final S° concentration (Extended Data Fig. 
6a). As positive reference, methane was provided at 0.2 MPa (at 60°C roughly 
equivalent to 1.6mM in solution). Sulfide production in the experiments was 
repeatedly measured every 3 to 4 days using the copper sulfide assay*° and absorp- 
tion spectrometry at 480 nm. TAOM rates with methane as the sole energy source 
(0.2 MPa) reached approximately 0.100 + 0.030 JM per day, compared to a nega- 
tive control (nitrogen; <0.001 1M per day). Rates determined for other substrates 
were compared to those under TAOM conditions. 

Influence of hydrogen addition on methane oxidation in TAOM cultures. To 
determine the effect of hydrogen addition on methane oxidation rates, TAOM 
culture aliquots were supplemented with methane and hydrogen (0.15 MPa and 
0.05 MPa, respectively), or only methane as control (0.15 MPa). Cultures were 
incubated headspace-free at 50°C for this experiment, because hydrogen was 
too rapidly consumed at 60°C for time-course experiments. To determine con- 
centrations of methane and hydrogen, 1 ml of medium was sampled with gas-tight 
glass syringes, and the sampled medium was concurrently replaced with substrate- 
free medium to avoid the formation of a headspace. The sampled medium was 
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injected through the septum of 10 ml Exetainer filled with 1ml NaOH and 
concentrations of CH, and H, were measured as described below. 

Presence and production of hydrogen in active TAOM cultures. To determine 
hydrogen concentrations at TAOM conditions, 20 ml of culture was transferred 
into 156-ml bottles at 60 °C and gas phases were repeatedly sampled using glass 
syringes (1 ml) combined with direct measurements on the gas chromatograph. 
Cultures incubated for 3 or more days reached stable hydrogen concentrations. 
A comparison to molybdate addition is provided in Extended Data Fig. 6b, c. To 
quantify molecular hydrogen production in TAOM, 20 ml of culture was supplied 
with sodium molybdate (10 mM final concentration). This molybdate concentra- 
tion assured complete inhibition of hydrogen-dependent sulfate reduction as 
shown in replicate incubations of TAOM culture (1 to 25mM molybdate) with 
hydrogen (0.1 MPa) as the sole electron donor (Extended Data Fig. 6d). Samples 
were stored at 60 °C on a shaking table and repeatedly sampled by glass syringes. 
Concentrations of methane and hydrogen were measured via gas chromatography 
coupled to flame ionization detection (Focus GC, Thermo) and via reducing 
compound photometry (RCD; Peak Performer 1 RCP; Peak Laboratories). 
Determination of carbon fixation by TAOM consortia. Replicate culture ali- 
quots (n = 5) were incubated in 5-ml Hungate tubes supplemented with methane, 
sulfate and ‘*C-labelled inorganic carbon (380 kBq). AOM-independent carbon 
fixation was determined under N> atmosphere (n = 5). To determine methane 
oxidation rates, replicate vials were incubated with '*C-methane (14kBq). 
Incubations were performed at 60°C for 24h. Samples were blotted onto 
0.2-j1m mixed cellulose esters membrane filters (Millipore, Merck). Filters were 
dried and potential residual inorganic carbon was removed by exposing the filters 
to an HCl atmosphere for 24h. Radioactivity in liquid aliquots (0.1 ml) and filters 
was determined by liquid scintillation counting (scintillation mixture; Filtercount; 
Perkin Elmer; scintillation counter 2900TR LSA; Packard). 

Cultivation of HotSeep-1 on molecular hydrogen. To isolate the hydrogeno- 
trophic sulfate reducers in the TAOM enrichment, aliquots were transferred to 
Hungate tubes (20 ml) and diluted 1:10 to 1:10° with marine sulfate reducer 
medium. All vials were amended with 0.2 MPa H2:CO, (80:20) gas phase, and 
additionally stored in Nz atmosphere to prevent oxygen flux into the culture vials. 
Vials were stored at the TAOM temperature optimum (60 °C) and measured for 
sulfide production using the copper sulfate assay“. To identify cultivated micro- 
organisms, the 16S rRNA gene of active hydrogenotrophic cultures was directly 
amplified from freeze-thawed pellets of culture aliquots (primer pair GM3/GM4) 
and sequenced as described above. The phylogenetic affiliation was inferred with 
the ARB software package*® and Release 115 of the ARB SILVA database”’. 
Representative sequences are deposited at NCBI with the accession numbers 
KT152886 and KT152887. 

Physiology experiments with HotSeep-1. Electron acceptor tests. Culture aliquots 
(1 ml tenfold-diluted in artificial anoxic seawater medium) were supplied with 
different potential electron acceptors (colloidal sulfur, sulfite or thiosulfate) with 
and without the addition of hydrogen. Potential growth on alternative carbon 
sources (that is, acetate, butyrate, peptone and methyl sulfide) was tested. 
Growth rates. Growth rates were independently determined from the development 
of sulfide concentrations and cell counts (from DAPI-stained cells for total cell 
numbers and from fluorescence in situ hybridized cells for specific cell numbers) 
from replicate cultures (grown from 10% inoculum). 

Growth efficiencies. Efficiencies were determined in a *C-DIC radiotracer assay. 
Replicate cultures were spiked with '‘C-DIC (~ 5.4 MBq) and incubated with 
H2:CO; or, as control, with Nz:CO, headspace. Sulfate-dependent hydrogen con- 
sumption was determined by the increase of sulfide (colourimetrically*’) and by 
the decrease of sulfate (via ion chromatography) in the medium. Fixed carbon was 
measured from culture aliquots (5 ml volume) blotted on filters as described above. 
Concentrations of radioactivity on the filter and the medium were determined 
via scintillation counting. The total carbon fixation (mmol per ml culture) was 
calculated as '*C uptake into particulate organic carbon multiplied by total DIC 
(4C-POC (Bq per ml of culture)/ 4C-total (Bq per ml of culture) X DIC (mmol 
per ml culture)], and normalized to reducing equivalent transfer, values are com- 
pared with the consumption of sulfide. 

Metagenome sequencing and draft genome assembly of ANME-1 and 
HotSeep-1. Genomic DNA was extracted from TAOM and HotSeep-1 enrich- 
ment cultures (as described above) and prepared for Illumina sequencing using the 
Nextera mate pair sample preparation kit (Illumina), following the Gel-Plus pro- 
tocol of the manufacturer’s user guide. DNA fragments with a length of approxi- 
mately 5-8kb were extracted from a preparative gel before circularization. 
Additionally a paired-end read library with insert size of 500 bp was constructed 
for the TAOM enrichment using the TruSeq library preparation kit. Libraries were 
sequenced on a MiSeq instrument (MiSeq, Illumina) in a2 X 250 bases paired-end 
run. Quality-controlled mate pair reads were assembled using the SPAdes genome 
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assembler v.3.5.0 (ref. 41) with default values of k and the -hqmp option. 
Assembled contigs from the TAOM metagenome were binned based on tetranu- 
cleotide frequency using the Metawatt software”. ANME-1- and HotSeep-1-spe- 
cific bins were extracted for targeted reassembly using the SPAdes genome 
assembler v.3.5.0' with mapped mate pair and paired end read data and default 
values of k and subsequently were used as draft genomes. A HotSeep-1 draft 
genome was also obtained from the assembled contigs of the highly enriched 
HotSeep-1 culture metagenome (hydrogenotrophic HotSeep-1). 

Draft genome analysis. Draft genomes were annotated with Prokka’’, and the 
draft genome of HotSeep-1 (obtained from the hydrogenotrophic HotSeep-1 
enrichment) was additionally annotated with an in-house pipeline and analysed 
using GenDB™ and JCoast*’. The annotation of reported genes was manually 
curated. An expectation (E)-value cut-off of 1 * 107° was considered for all pre- 
dictions of putative protein functions. Identity of the enriched hydrogenotrophic 
HotSeep-1 and the TAOM partner HotSeep-1 was evaluated by pairwise blast 
search of the nucleotide sequence of the 16S and 23S rRNA genes, functional 
and housekeeping genes and the intergenic spacer region (Extended Data 
Table 2) derived from the draft genome of the TAOM partner HotSeep-1 (query) 
versus the hydrogenotrophic HotSeep-1 (subject). To verify that the organisms 
belong to the same species the average nucleotide identity (ANI) and the tetra- 
nucleotide frequency correlation of the two draft genomes were determined using 
JSpecies*® (v.1.2.1). Analyses resulted in tetranucleotide frequency correlation of 
0.999 and ANI of >99%. To check for absence of ANME-1 in the hydrogeno- 
trophic HotSeep-1 culture metagenomic reads were mapped to the SILVA SSU 
119 reference database (bbmap v.35 and pyhloflash v.1.5) for phylogenetic clas- 
sification at minimum identities of 90%, 95% and 97% resulting in approximately 
3,500, 2,100 and 1,500 classified 16S rRNA gene fragments, respectively, which 
were screened for hits to ANME related sequences. 

To identify potential cytochrome c and type IV pili (T4P) genes in the draft 
genomes of ANME-1 and HotSeep-1, protein domains were predicted using 
hmmscan (HMMER 3.0%’) with the PfamA* and TIGRFAM” reference data- 
bases. Potential cytochromes were identified by the CXKXCH motive and cyto- 
chrome c-specific protein domain models. Potential T4P genes were identified 
using protein models related to T4P. ANME-1 cytochrome and HotSeep-1 cyto- 
chrome and pili genes were compared for their best matching hits in the 
G. sulfurreducens (strain PCA) and G. metallireducens genome and the NCBI 
non-redundant protein database using blastp. Cytochrome annotation based 
on detected protein domains in PfamA, pili annotation based on detected 
protein domains and amino acid sequence. Subcellular localization was 
predicted with PSORTb”® (v.3.0.2). For cytochromes the number of potential 
haem-binding sites was derived from the abundance of the CKXCH motif. For 
sequence comparison to the NCBI non-redundant protein database and Geobacter 
spp. and for details on protein domains and subcellular localization prediction 
see Supplementary Table 2a, b, (ANME-1 cytochromes), Supplementary Table 
3a, b, (HotSeep-1 cytochromes) and Supplementary Table 4a, b (HotSeep-1 
Type IV pili biogenesis). Representative sequences are deposited in GenBank 
under the accession numbers KT759143-KT759147, KT795302-KT795321, 
KT795322 and KT795323. 

The ANME-1 draft genome was searched for genes encoding catalytic subunits 
of hydrogenases using blastp search against known genes of catalytic subunits of 
[NiFe] and [FeFe] hydrogenases (mvhA, echA, frhA, vhuA, vhtA, ehaO, hymC). 
Annotation of genes with hits was evaluated by blastp search against the NCBI 
non-redundant protein database for best matching reference sequences related to 
hydrogenases, but none were found. 

Transcriptome analysis of TAOM and HotSeep-1. To collect cells for transcrip- 
tome analyses a 3.5-day experiment with replicates of 20 ml culture in 60-ml vials 
was carried out (Fig. 1). From triplicate TAOM cultures incubated with methane 
as control, with hydrogen, with methane/hydrogen mixture, or nitrogen as nega- 
tive control, ~80% of the enrichment medium was removed and RNA was pre- 
served using pre-heated RNA later (Life Technologies, ThermoFisher Scientific). 
Total RNA was extracted using the Quick-RNA MiniPrep kit (Zymo Research), 
treated with DNase I (Roche) and purified using the RNeasy MinElute Cleanup kit 
(Qiagen) following the manufacturer’s recommendations. Removal of rRNA was 
omitted and total RNA was prepared for sequencing using the TruSeq stranded 
mRNA library prep kit (Illumina) following the manufacturer’s guidelines. The 
cDNA library was sequenced on a MiSeq instrument (MiSeq, Illumina) generating 
between 2 to 3 million 150-bp single-end reads per library. Quality-controlled 
reads were mapped to the draft genomes of HotSeep-1 and ANME-1 using bbmap 
(v.35) with a minimum mapping identity of 98%. To quantify gene expression 
unambiguously mapped reads per gene were counted using bedtools multicov 
(v.2.24.0). To compare relative expression patterns within each organism across 
treatments, read counts per feature were converted to transcripts per million 
(TPM), which is the abundance of a specific gene (i) relative to the abundance 


and length of all other transcribed genes (j) observed in one million sequenced 
reads calculated according to ref. 51: 


TPM; = X;/I; x (vm) x 10° 


J 


where X = counts and / = length (bp) per gene. Relative changes in expression of 
selected genes were calculated by comparing TPM normalized expression data of 
the H, and H, + CH, treatment to those under TAOM (control) conditions. 
Differential expression (P value, fold change and effect size) between control 
(TAOM condition) and treatment (Hz or H2 + CH,) was computed with the 
aldex2 R package” for ANOVA-like differential expression analysis. Raw read 
numbers, read mapping data and statistical analysis are provided in 
Supplementary Table 1 (total expression) and Supplementary Table 5 (specific 
gene expression). 

For HotSeep-1 transcriptomes total RNA was extracted from triplicate cultures 
(50 ml) grown on hydrogen/CO, following the same procedure as described for 
TAOM enrichments (see above). Removal of rRNA was omitted and total RNA 
was prepared for sequencing using the TruSeq stranded mRNA library prep kit 
(Illumina) following the manufacturer’s guidelines. The cDNA library was 
sequenced on a MiSeq instrument (MiSegq, Illumina) generating between 6.4 to 
6.9 million 75-bp paired-end reads per library. Quality-controlled reads were 
mapped to the draft genome of HotSeep-1 using bbmap (v.35) with a minimum 
mapping identity of 98%. To quantify gene expression unambiguously mapped 
reads per gene were counted using featureCount™ (part of Subread, v.1.4.6.) with 
the -p option to count fragments instead of reads. Fragment counts per gene were 
converted to transcripts per million (TPM) as described above for TAOM tran- 
scriptome analyses. 

Cultivation of Geobacter consortia. Active cultures of G. sulfurreducens 
(strain PCA; DSM 12127) and G. metallireducens (strain GS-15; DSM 7210) were 
mixed in fresh medium (DSM Medium 826) supplied with Na,-fumarate (50 mM) 
and ethanol (20 mM) according to ref. 10 and cultivated anaerobically at 33 °C. 
After subsequent transfers (1% inoculum) a well-growing culture consisting of 
reddish microbial aggregates developed, which was used for thin sectioning and 
electron microscopy. 

Transmission electron microscopy. The cell material was harvested at 
2,000 r.p.m. using a Stat Spin Microprep 2 table-top centrifuge. After centrifu- 
gation the pellet was fixed by immersion using 2% glutaraldehyde in 0.1 M caco- 
dylate buffer at pH 7.4. Fixation was performed for 60 min at room temperature. 
The fixed pellet was immobilized with 2% agarose in cacodylate buffer at pH 7.4. 
The pellet was cubed and the pieces carefully washed with buffer and further fixed 
in 1% osmium tetroxide. After pre-embedding staining with 1% uranyl acetate, 
samples were dehydrated and embedded in Agar 100 (Epon 812 equivalent). As an 
independent complementary method (shown in Extended Data Fig. 5a), samples 
were placed in aluminium platelets of 150 1m depth containing 1-hexadecene 
(ref. 54). The platelets were frozen using a Leica Em HPM100 high pressure freezer 
(Leica Mikrosysteme Vertrieb GmbH). The frozen samples were transferred to an 
Automatic Freeze Substitution Unit Leica EM AFS2 and substituted at —90 °C ina 
solution containing anhydrous acetone, 0.1% tannic acid for 24h and in anhyd- 
rous acetone, 2% OsO,, 0.5% anhydrous glutaraldehyde (EMS Electron 
Microscopical Science) for an additional 8h. After a further incubation over 
20h at —20°C samples were warmed up to + 4°C and washed with anhydrous 
acetone subsequently. The samples were embedded at room temperature in Agar 
100 at 60 °C over 24h. Thin sections (30-60 nm) were counterstained with uranyl 
acetate and lead citrate and examined using a Philips CM 120 transmission elec- 
tron microscope (Philips Inc.). In total, we recorded more than 200 views on 
TAOM consortia, 64 views on HotSeep-1 and 90 views of Geobacter consortia. 
Thermodynamic calculations. Free energy yields (AG,x,) were calculated accord- 
ing to the equation: 

(Pi)" 

(Ri)" 

including the gas constant R, the temperature T (K) and the measured activities/ 
partial pressures of the respective products P; and reactants R; in their respective 
stoichiometric appearance (n) in the reaction. Values consider activities and fugac- 
ity of respective compounds. The temperature-corrected standard free energy 
AG,7) were determined according to ref. 55. 
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Extended Data Figure 1 | Models of possible species interaction and zero-valent sulfur transfer to the partner bacteria. c, Direct interspecies 
mechanisms in TAOM tested in this study. a, Transfer of molecular electron transfer via conductive nanowires. 


intermediates such as hydrogen. b, Incomplete reduction of sulfate in ANME 
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Extended Data Figure 2 | Visualization of and growth experiments with 
HotSeep-1. a, Representative fluorescence micrograph of HotSeep-1 culture 
(probe HotSeep-1-590; 22 similar images obtained). Cells are solitary or 
form small aggregates. Scale bar, 10 jum. b, c, Semi-logarithmic illustration of 
the development of sulfide (b) or numbers of cells and resulting doubling 
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times (c) (doubling time = In(2)/exponential factor of the regression curve) 
during incubation of the HotSeep-1 culture with hydrogen as the sole energy 
source and sulfate. Biological replicates n = 3; data is presented as mean = s.d., 
lines of best fit defined by least squares method. 
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Extended Data Figure 3 | Effect of zero-valent elemental sulfur and (c). Molybdate addition caused tenfold higher hydrogen concentrations than 


molybdate additions on TAOM. a, Sulfide production in response to zero- the TAOM condition. d, Inhibition of methane-dependent sulfide production 
valent (colloidal) sulfur addition versus TAOM conditions; zero-valent sulfur _ at different molybdate concentrations. Biological replicates n = 3; symbols 
did not cause sulfide formation. b, c, Monitoring of hydrogen partial represent mean values; error bars are s.d.; b and c show a single time series with 
pressures at TAOM conditions (open circles) versus extra addition of 1OmM __ the same culture. 

molybdate (filled circles) for either the full times series (b) or the first 10h 
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Extended Data Figure 4 | Effect of hydrogen on microbial methane until hydrogen was fully consumed. Afterwards methane consumption 
oxidation. a, Methane (0.15 MPa) supplied as the sole electron source was occurred at the same rate as in the control with only methane (a). Methane, 
steadily consumed over time by TAOM. b, When both methane (0.15 MPa) technical replicates n = 3; symbols represent mean values; error bars are s.d.; 
and hydrogen (0.05 MPa) were added, hydrogen was rapidly consumed hydrogen, single measurements. Experiment was replicated once in the 


(grey bars), whereas methane consumption was reversely inhibited (green line) _ laboratory. 
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Extended Data Figure 5 | Relative expression of marker genes of HotSeep-1 
in consortial growth on methane (TAOM) versus enrichment on hydrogen. 
Genes encoding proteins apparently involved in direct interspecies electron 
transfer (CytC and PilA) are strongly overexpressed during TAOM (red) 
compared to hydrogenotrophic growth (green). Gene expression given in 
terms of TPM (transcripts per million). Biological replicates n = 3; error 

bars are s.d. 
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Extended Data Figure 6 | Thin-sections of TAOM and dual species are smaller (approximately 1 X 0.5 um), of rod-like shape, and have lower 
Geobacter spp. aggregates. a, TAOM culture. High-pressure frozen ANME-1 contrast. The matrix between the cells is largely filled with filaments. 

cells (A) have a cylindrical shape and a size of 1.5 X 0.8 um, appearing Representative of 24 images. Scale bar, 3 jum. b, Thin section of Geobacter 
circular in cross-section, and rectangular when cut along the axis. Their cell consortium with intercellular nanowires using the same embedding techniques 


content shows a high contrast. A* indicates cell envelopes. HotSeep-1 cells(H) as for TAOM consortia, representative of 20 images. Scale bar, 300 nm. 
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Extended Data Table 1 


Phylogenetic affiliation of cloned 16S rRNA gene sequences obtained from TAOM enrichments in 2010 (compiled 
from ref. 6) and after 1.5 years of cultivation in 2012 (this study) 


Phylogenetic Group No of clones 
2010, slurry 2012, sediment-free 
Archaea 
Euryarchaeota 
ANME-1 
ANME-1-Guaymas cluster 46 (82%) 148 (89%) 
Other related ANME-1 - 7 
Thermoplasmatales (19c-33 related) 6 7 
Thermococcales - 1 
Others 4 3 
Sum 56 166 
Bacteria 
Proteobacteria 
Betaproteobacteria 3 1 
Gammaproteobacteria 11 - 
Deltaproteobacteria 
HotSeep-1-Cluster 124 (60%) 89 (59%) 
DSS group 1 1 
Others 16 4 
Acidobacteria 1 - 
Actinobacteria 3 - 
Candidate Division OD1 6 1 
Candidate Division OP3 5 40 
Candidate Division OP8 21 6 
Chloroflexi 8 - 
Others 8 9 
Sum 207 151 
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Extended Data Table 2 | Pairwise comparison of nucleotide sequences from the HotSeep-1 draft genomes derived from the TAOM culture 


versus the HotSeep-1 culture with hydrogen 


Identity Query 

Feature Identity (bp) (%) Gaps coverage _E value 

16S rRNA 1554/1555 99 0/1555 100 0 
23S rRNA 3025/3029 99 0/2029 100 0 
ITS (23S-16S) 270/271 99 0/271 100 3.00E-145 
dsrA 1406/1438 98 0/1438 100 0 
aprA 1905/1905 100 0/1905 100 0 
Hydogenase small subunit 1437/1437 100 0/1437 100 0 
Hydrogenase large subunit 916/918 100 0/918 99 0 
dnaK 1884/1893 99 0/1893 100 0 


ITS, internal transcribed spacer. 
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Extended Data Table 3 | Effect of potential intermediates in AOM on sulfide production of TAOM culture (n = 3 replicates, 20 days incubation) 


Sulfide production 
Substrate plus substrate plus methane 


Control (no donor) - 
Methane + 
Colloidal sulfur - 4: 
Hydrogen +++ +++ 
Carbon monoxide - = 
Methyl sulfide - = 
Methanol = je 
Acetate - 4 


Formate = 4 


—, sulfide production at level of negative control; +, sulfide production similar to TAOM under standard conditions; +++, sulfide production tripled compared to TAOM standard conditions. 
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Extended Data Table 4 | Genes encoding cytochrome c proteins identified in thermophilic ANME-1 and HotSeep-1 draft genomes, and for 


type IV pili biogenesis identified in the HotSeep-1 draft genome with expression >20 transcripts per million. 


Cytochrome c type : : Expression Expression 
Seed on Phi” . Eason - ei heme TAOM anaes in H, 
domain prediction (TPM) treatment 
ANME-1 
Cytochrom_c3_2 Unknown (CM,CW,E) 8 1,063 i) 
Cytochrome_C7 Unknown (CM,CW,E) 4 603 i 
Cytochrome_C7 Extracellular 4 506 i 
Cytochrom_NNT Cytoplasmic 5 73 i] 
HotSeep-1 
Paired_CXXCH_1 Extracellular 6 2,485 J 
Cytochrom_ Cll Periplasmic 4 1,011 : 
Paired CXXCH_1 Unknown (CM,OM,P,E) 7 974 . 
Cytochrome_C554 Unknown (CM,OM,P,E) 5 881 
Cytochrom_ CII Periplasmic 4 179 - 
Cytochrome_C7 Cytoplasmic Membrane 5 95 : 
Paired CXXCH_1 Cytoplasmic 10 95 - 
Cytochrom_c3_2 Unknown (CM,P,E) 12 86 : 
Cytochrom_c3 2 Periplasmic 12 74 - 
Cytochrome_C554 Unknown (CM,P,E) 4 24 | 
, : : Expression Expression 
Predicted pili protein jaesitecdion SORT) i dee in TAOM change in Hy 
(TPM) treatment 
HotSeep-1 
assembly protein (piJA) Extracellular 74 1084 L 
retraction ATPase (pi/T) Cytoplasmic 40 51 1 
assembly protein (pi/Y) Extracellular 26 46 I) 
assembly ATPase (pi/B) Cytoplasmic 47 26 - 
secretion (pi/Q) Unknown (OM, C) 32 26 - 
assembly protein (pi/A) | Cytoplasmic Membrane 41 23 1 
retraction ATPase (pi/T) Cytoplasmic 55 21 - 
assembly protein (piJ/M) Cytoplasmic 35 21 - 
assembly protein (pi/O) Cytoplasmic Membrane 35 21 1 


Genes in bold are presented in Fig. 3a. CM, cytoplasmic membrane; CW, cell wall; E, extracellular; OM, outer membrane; P, periplasm; TPM, transcripts per million; |, upregulated by a factor of 2; | downregulated 
by a factor of 2; -,change smaller than by a factor of 2. 
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Dynamic m°A mRNA methylation directs 
translational control of heat shock response 


Jun Zhou", Ji Wan', Xiangwei Gao!, Xingqian Zhang’, Samie R. Jaffrey? & Shu-Bing Qian! 


The most abundant mRNA post-transcriptional modification 
is N° -methyladenosine (m°A), which has broad roles in RNA 
biology’*. In mammalian cells, the asymmetric distribution of 
m°A along mRNAs results in relatively less methylation in the 
5’ untranslated region (5’UTR) compared to other regions®”’. 
However, whether and how 5’UTR methylation is regulated is 
poorly understood. Despite the crucial role of the 5’UTR in trans- 
lation initiation, very little is known about whether m°A modifica- 
tion influences mRNA translation. Here we show that in response 
to heat shock stress, certain adenosines within the 5’UTR of newly 
transcribed mRNAs are preferentially methylated. We find that the 
dynamic 5’UTR methylation is a result of stress-induced nuclear 
localization of YTHDEF2, a well-characterized m°A ‘reader’. Upon 
heat shock stress, the nuclear YTHDF2 preserves 5’UTR methyla- 
tion of stress-induced transcripts by limiting the m°A ‘eraser’ FTO 
from demethylation. Remarkably, the increased 5’UTR methyla- 
tion in the form of m°A promotes cap-independent translation 
initiation, providing a mechanism for selective mRNA translation 
under heat shock stress. Using Hsp70 mRNA as an example, 
we demonstrate that a single m°A modification site in the 
5’UTR enables translation initiation independent of the 5’ end 
N’-methylguanosine cap. The elucidation of the dynamic features 
of 5’'UTR methylation and its critical role in cap-independent 
translation not only expands the breadth of physiological roles of 
m°A, but also uncovers a previously unappreciated translational 
control mechanism in heat shock response. 

Given the reversible nature of m°A mRNA methylation*’, we 
sought to assess the potential impact of heat shock stress on m°A 
modification of eukaryotic mRNAs. Using immunofluorescence stain- 
ing, we first examined the subcellular localization of the entire m°A 
machinery in a mouse embryonic fibroblast (MEF) cell line before and 
after heat shock stress. It is believed that m°A modification occurs 
primarily at nuclear speckles, whereas its functionality takes place 
in the cytosol (Fig. 1a). Consistent with this notion, both the m°A 
‘writers’ (METTL3, METTL14, WTAP) and the eraser FTO were pre- 
dominantly present in the nucleus, whereas the majority of the reader 
YTHDE? resided in the cytosol (Fig. 1b and Extended Data Fig. 1). In 
response to heat shock stress, neither the writers nor the eraser chan- 
ged their nuclear localization (Extended Data Fig. 1). Surprisingly, 
nearly all of the YTHDF2 molecules were relocated into the nucleus 
from the cytosol upon heat shock stress (Fig. 1b). The same phenom- 
enon holds true in HeLa cells. Intriguingly, the protein level of 
YTHDEF2 was also markedly increased after heat shock stress in a 
manner similar to Hsp70 induction (Fig. 1c). In contrast, neither the 
m°A writers nor the eraser showed any differences in protein levels 
upon stress. Supporting the stress-induced transcriptional upregula- 
tion of YTHDF2, real-time PCR revealed a nearly fourfold increase of 
mRNA abundance after heat shock stress (Fig. 1d). The increased 
YTHDF2 abundance was not due to altered mRNA degradation since 
heat shock stress had negligible effects on mRNA stability (Extended 
Data Fig. 2a). Notably, YTHDF2 exhibited a relatively short half-life 
(ti2<1h) in cells, supporting the importance of stress-induced 


transcriptional upregulation. Genes encoding other YTH domain 
family proteins like YTHDFI1 and YTHDF3 also showed upregulation, 
although to a lesser extent (Extended Data Fig. 2b). Using a mouse 
fibroblast cell line lacking the heat shock transcription factor 1 
(HSF1)", we confirmed that YTHDF2 is subject to regulation by 
HSF1 (Extended Data Fig. 2c). The unexpected stress-inducible feature 
of YTHDF2 suggests a potential role of m°A modification in heat 
shock response. 

Although YTHDF2 primarily serves as the reader of m°A, recent 
proteomic data revealed that YTHDF2 has an extensive physical inter- 
action with the components of m°A writers". Given their co-local- 
ization upon heat shock stress, we postulated that the nuclear presence 
of YTHDF2 could influence the m°A modification and alter the 
landscape of mRNA methylomes. Using an optimized m°A-seq pro- 
cedure®”’, we sequenced the entire methylated RNA species purified 
from MEF cells with or without heat shock stress. From a total of 
15,454 putative methylation sites, we confirmed the m°A consensus 
sequence motif as GGAC (where the underlined A is modified) 
(Extended Data Fig. 3 and Supplementary Table 1). Consistent with 
previous reports®’, the majority of m°A sites are enriched in the 
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Figure 1 | YTHDF2 changes cellular localization and expression levels in 
response to heat shock stress. a, Schematic of m°A modification machinery in 
mammalian cells. b, Subcellular localization of YTHDF2 in MEF and HeLa 
cells before or 2h after heat shock (42 °C, 1h). Bar, 10 um. Images are 
representative of at least 50 cells. c, Immunoblotting of MEF cells after heat 
shock stress (42 °C, 1h). N, no heat shock. The right panel shows the 
relative protein levels quantified by densitometry and normalized to B-actin. 
Representative of three biological replicates. d, Same samples in c were used for 
RNA extraction and real-time PCR. Relative levels of indicated transcripts 
are normalized to B-actin. Error bars, mean + s.e.m.; *P < 0.05, **P < 0.01, 
unpaired two-tailed t-test; n = 3 biological replicates (c and d). 
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Figure 2 | Altered m°A profiles in MEF cells in response to heat shock stress. 
a, Metagene profiles of m°A distribution across the transcriptome of cells 
before or 2 h after heat shock (42 °C, 1 h). Black arrow indicates the m°A peak in 
the 5'UTR region. b, Transcripts are stratified by different expression levels 
after heat shock stress, followed by metagene profiles of m°A distribution. 

c, A box plot depicting fold changes of mRNA levels after heat shock for 
transcripts showing increased or decreased m°A modification in the 5’UTR. 


vicinity of the stop codon and in the 3’UTR (Fig. 2a). Unexpectedly, 
heat shock stress led to an elevated m°A peak in the 5’'UTR, but not 
other regions. Reasoning that only a handful of genes undergo upre- 
gulation as a result of heat shock response’*"*, we compared the levels 
of m°A modification between stress-inducible and non-inducible tran- 
scripts defined by RNA-seq. It is clear that the upregulated transcripts 
showed greater m°A modification in the 5'UTR than the transcripts 
downregulated upon stress (Fig. 2b). We next stratified transcripts 
based on differential changes of m°A modification in the 5'UTR. 
While transcripts with elevated 5’UTR methylation are mostly upre- 
gulated in response to stress, transcripts exhibiting decreased 5'UTR 
methylation are largely downregulated (Fig. 2c, P< 0.001, Mann- 
Whitney Test). One particular example of stress-induced transcripts 
is the Hsp70 gene HSPA1A, which not only showed a 90-fold increase 
of mRNA levels after heat shock, but also displayed a prominent m°A 
peak in the 5’UTR (Fig. 2d). By contrast, the constitutively expressed 
Hsc70 gene HSPA8 showed only minor increase in both the mRNA 
level and the m°A modification in response to heat shock stress 
(Extended Data Fig. 4). These results suggest that the increased 
5'UTR methylation selectively occurs on the stress-inducible mRNAs. 

To examine whether the elevated 5'UTR methylation upon heat 
shock stress is a result of nuclear localization of YTHDF2, we silenced 
YTHDF2 in MEF cells using lentiviruses expressing short hairpin 
RNAs. Remarkably, MEF cells lacking YTHDF2 demonstrated a sub- 
stantial loss of m®°A modification in the 5’UTR (Fig. 2e). Upon heat 
shock stress, these cells no longer showed the elevated 5’UTR methy- 
lation as seen in control cells. The abolished 5’UTR methylation in the 
absence of YTHDF2 was clearly exemplified in HSPA1A that exhibited 
only background m°A modification in the 5’UTR (Extended Data 
Fig. 5). This result indicates a novel function of YTHDF2 in heat shock 
response by promoting 5’UTR methylation on mRNAs transcribed 
during stress. 

YTHDF2 is not a methyltransferase per se, and does not bind to 
mRNAs without prior m°A modification*”. How does the nuclear 
presence of YTHDF2 promote selective methylation in the 5’UTR? 
One possibility is that YTHDEF2 protects the pre-existing m°A from 
FTO-mediated demethylation. Upon heat shock stress, the nuclear 
localization of YTHDF2 probably limits the accessibility of FTO to 
newly minted m°A sites, thereby tilting the equilibrium towards 
methylation. Indeed, an in vitro m°A binding and demethylation assay 
confirmed direct competition between FTO and YTHDF2 (Extended 
Data Fig. 6). To investigate whether FTO preferentially removes m°A 


592 | NATURE | VOL 526 | 22 OCTOBER 2015 


Box plot centre line (black), mean; whiskers, 5th and 95th percentiles; red line, 
median. d, An example of stress-induced transcript HSPA1A harbouring m°A 
peaks. IP, immunoprecipitation. e, Metagene profiles of m°A distribution 
across the transcriptome of cells with or without YTHDF2 knockdown, before 
or after heat shock stress. f, Metagene profiles of m°A distribution across the 
transcriptome of cells with or without FTO knockdown, before or after heat 
shock stress. 


modification from the 5’'UTR, we knocked down FTO from MEF cells 
and examined the m°A distribution across the entire transcriptome. 
Notably, only the 5’UTR region showed an increase of m°A density in 
cells lacking FTO (Fig. 2f). Additionally, the 5'UTR methylation 
showed no further increase upon heat shock stress in the absence 
of FTO. 

The 5'UTR is crucial in mediating translation initiation of eukar- 
yotic mRNAs’*”’. Under stress conditions, the cap-dependent trans- 
lation is generally suppressed. However, subsets of transcripts are 
selectively translated via a poorly understood cap-independent 
mechanism’””’. To investigate whether differential methylation of 
5'UTR influences the translational status of these mRNAs, we con- 
ducted ribosome profiling of MEF cells with or without heat shock 
stress. Among the genes undergoing stress-induced transcriptional 
upregulation, many not only showed elevated m°A modification in 
the 5'UTR, but also demonstrated increased ribosome occupancy in 
the coding region (Fig. 3a). Several prominent examples are genes 
encoding heat shock proteins, in particular Hsp70 (Supplementary 
Table 2). Therefore, the coordinated upregulation of transcription 
and 5’'UTR methylation is coupled with robust translation in response 
to heat shock stress. 

To validate the causal relationship between stress-induced 5’UTR 
methylation and selective translation, we examined Hsp70 synthesis in 
cells with differential m°A modification. Knocking down YTHDF2 
leads to depleted 5’UTR methylation, as revealed by m®°A-seq 
(Fig. 2e). Indeed, direct m°A blotting of HSPAIA purified from 
heat-shock-stressed MEFs confirmed the marked reduction of methy- 
lation in cells lacking YTHDF2 (Fig. 3b). Remarkably, the heat-shock- 
induced Hsp70 synthesis was substantially reduced in the absence of 
YTHDF?2 (Fig. 3c). The comparable Hsp70 mRNA levels in cells with 
or without YTHDF2 knockdown indicate that the reduced Hsp70 
synthesis is a result of translational deficiency (Extended Data 
Fig. 7). Further supporting this notion, the Hsp70 transcript, but not 
GAPDH, showed much less enrichment in the polysomes of MEF cells 
lacking YTHDF2 (Fig. 3d). 

Reasoning that YTHDF2 competes with FTO in preserving 5'UTR 
m°A modification, we speculated that FTO knockdown would 
increase the 5’UTR methylation as well as the translation efficiency 
of Hsp70 mRNA. This was indeed the case. Direct m°A blotting of 
HSPAIA purified from stressed MEFs lacking FTO revealed a clear 
increase of methylation when compared to the scramble control 
(Extended Data Fig. 8). Importantly, FTO knockdown potentiated 
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Figure 3 | m°A modification promotes selective translation under heat 
shock stress. a, A 3D plot depicting fold changes (log,) of mRNA abundance, 
coding sequence ribosome occupancy (Ribo-seq), and 5/UTR m°A levels 
in MEE cells after heat shock stress. b, m°A blotting of HSPA1A purified from 
MEFs with or without YTHDF2 knockdown. Messenger RNAs synthesized 
by in vitro transcription in the absence or presence of m°A were used as 
control. Images are representative of two biological replicates. c, Immuno- 
blotting of MEF cells with or without YTHDF2 knockdown after heat shock 
stress (42 °C, 1h). N, no heat shock. The right panel shows the relative protein 
levels quantified by densitometry and normalized to B-actin. Blots are 
representative of three biological replicates. d, MEF cells with or without 
YTHDEF2 knockdown were subject to heat shock stress followed by sucrose 
gradient sedimentation. Specific mRNA levels in polysome fractions were 
measured by quantitative PCR. The values are first normalized to the spike in 
control then to the total. Error bars, mean + s.e.m.; *P < 0.05, unpaired two- 
tailed t-test; n = 3 biological replicates (c and d). 


the synthesis of Hsp70 after heat shock stress. Collectively, these results 
established the functional connection between dynamic 5'UTR 
methylation and selective mRNA translation during stress. 

It is commonly believed that the 5'UTR of Hsp70 mRNA recruits 
the translational machinery via an internal ribosome entry site 
(IRES)*”*. However, conflicting results exist and the exact cap- 
independent translation-promoting determinants remain elusive”. 
Given the fact that the normal 5’ end cap structure is a methylated 
purine (N’-methylguanosine, m’G), we hypothesize that the stress- 
induced m°A in the 5’UTR enables selective translation by acting as a 
functional cap substitute. To test this hypothesis, we performed a 
firefly luciferase (Fluc) reporter assay in MEF cells by transfecting 
mRNAs synthesized in the absence or presence of m°A (Fig. 4a). For 
the messenger without 5’UTR, random incorporation of m°A slightly 
reduced the Fluc activity after mRNA transfection. In the presence of 
5'UTR from Hsp70, but not tubulin, the incorporation of m°A mark- 
edly increased the Fluc activity in transfected MEF cells. Notably, m°A 
incorporation does not affect the stability of the synthesized mRNAs in 
transfected cells (Extended Data Fig. 9a). We next replaced the 5’ end 
mG cap with a non-functional cap analogue ApppG. As expected, the 
resultant mRNA did not support translation in the absence of 5’UTR 
or in the presence of tubulin 5’UTR (Fig. 4a and Extended Data 
Fig. 9b). Only when the Hsp70 5’UTR was present, was the translat- 
ing-promoting feature clearly manifested after m°A incorporation, in 
particular under stress conditions (Fig. 4a). This effect is specific to 
m°A modification but not m°Am because ribose methylation in the 
form of 2'-O-MeA suppressed translation of the Fluc reporter bearing 
Hsp70 5'UTR (Fig. 4a). Therefore, methylation of Hsp70 5'UTR in the 
form of m°A promotes cap-independent translation. 

To further demonstrate the 5’UTR specificity in m°A-facilitated 
cap-independent translation, we examined 5’UTRs from a constitu- 
tively expressed chaperone Hsc70 (HSPA8) and another stress- 
inducible chaperone Hsp105 (HSPH1) (Fig. 3a). Only the 5’'UTR of 
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Figure 4 | Selective 5'UTR m°A modification mediates cap-independent 
translation. a, MEF cells transfected with Fluc mRNA reporters were subject to 
heat shock treatment and the Fluc activity was measured by real-time 
luminometry. Fluc activities were quantified and normalized to the sample 
containing normal adenosine nucleotides. Red, m°A; green, 2'-O-MeA. 

b, Constructs expressing Fluc reporter with Hsp70 5’UTR or the one with 
A103C mutation are depicted on the top. Fluc activities in transfected MEF cells 
were quantified and normalized to the control containing normal A without 
stress. c, Fluc mRNAs bearing Hsp70 5'UTR with a single m°A site were 
constructed using sequential splint ligation. After in vitro translation in rabbit 
reticulate lysates, Fluc activities were quantified and normalized to the control 
lacking m°A. Error bars, mean + s.e.m.; *P < 0.05, unpaired two-tailed 

t-test; n = 3 biological replicates (a, b and c). d, A proposed model for dynamic 
m°A 5’UTR methylation in response to stress and its role in cap-independent 
translation. Under the normal growth condition, nuclear FTO demethylates 
the 5’'UTR m°A from nascent transcripts and the matured transcripts are 
translated via a cap-dependent mechanism. Under stress conditions, nuclear 
localization of YTHDF2 protects the 5’UTR of stress-induced transcripts from 
demethylation. With enhanced 5’UTR methylation, these transcripts are 
selectively translated via a cap-independent mechanism. 


Hsp105 enhanced translation of the non-capped message after m°A 
incorporation (Extended Data Fig. 9c). This result is consistent with 
the selective 5'UTR methylation of stress-inducible transcripts upon 
heat shock stress. 

The 5’UTR contains multiple As, although not all of them are 
methylated. On the basis of the predicted m°A sequence motif, the 
A residue at the 103 position of Hsp70 mRNA is likely to be methy- 
lated. Using a single-nucleotide m°A detection method”, we con- 
firmed the methylation event at this position upon heat shock stress 
(Extended Data Fig. 10a). To demonstrate the significance of methyla- 
tion at this single site, we introduced an A103C mutation into the Hsp70 
5'UTR. Remarkably, m°A incorporation no longer promoted trans- 
lation of the Fluc reporter in transfected cells (Fig. 4b). To directly 
demonstrate the importance of this single m°A site without changing 
the nucleotide, we employed a sequential RNA splint ligation strategy to 
construct a Fluc reporter bearing Hsp70 5’UTR with or without A103 
methylation (Fig. 4c)**’’. Using an in vitro translation system, the Fluc 
reporter containing the single m°A at the 103 position showed about 
50% increase in translation efficiency in comparison to the one with 
normal A (Fig. 4c). Notably, both messages showed comparable turn- 
over during the entire course of in vitro translation (Extended Data 
Fig. 10c). Collectively, these results firmly established a crucial role of 
5'UTR m°A modification in non-canonical translation initiation. 

Much of our current understanding of cap-independent translation 
is limited to the IRES mechanism”. However, beyond a few exam- 
ples, many cellular genes capable of cap-independent translation do 
not seem to contain any IRES elements. The results presented here 
demonstrate a surprising role of m°A in mediating mRNA translation 
initiation independent of the normal m’G cap. How exactly the 
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methylated adenosine recruits the translation machinery merits fur- 
ther investigation. m°A modification has been shown to alter RNA 
secondary structures’. It is possible that distinct translation initiation 
factors are recruited to the methylated 5’UTR, thereby facilitating cap- 
independent translation. 

In contrast to the wide belief that m°A modification is static on 
mRNAs, we found that 5’UTR methylation in the form of m°A is 
dynamic. Methylation often serves as a mark to distinguish self and 
foreign DNAs or parental and daughter DNA strands”. The stress- 
inducible mRNA 5’UTR methylation permits ribosomes to distinguish 
nascent transcripts from pre-existing messages, thereby achieving 
selective mRNA translation (Fig. 4d). The unexpected stress-inducible 
feature of YTHDF2 offers an elegant mechanism for temporal control 
of m°A modification on subsets of mRNAs. The mechanistic connec- 
tion between 5'UTR methylation and cap-independent translation 
solves the central puzzle how selective translation is achieved when 
global translation is suppressed in responding to stress. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cell lines and reagents. HeLa (cervical cancer) was originally purchased from 
ATCC and MEF cells were a gift from D. J. Kwiatkowski (Harvard Medical 
School). Cells were not authenticated recently but tested negative for mycoplasma 
contamination. Both cells were maintained in Dulbecco’s Modified Eagle’s 
Medium (DMEM) with 10% fetal bovine serum (FBS). Antibodies used in the 
experiments are listed below: anti-YTHDF2 (Proteintech 24744-1-AP, 1:1,000 
WB, 1:600 IF); anti-Hsp70 (Stressgen SPA-810, 1:1,000 WB); anti-FTO 
(Phosphosolutions 597-Fto, 1:1,000 WB, 1:600 IF); anti-METTL3 (Abnova 
H00056339-B0O1P, 1:1,000 WB, 1:600 IF); anti-METTL14 (sigma HPA038002, 
1:1,000 WB, 1:600 IF); anti-WTAP (Santa Cruz sc-374280, 1:1,000 WB, 1:600 
IF); anti-m°A (Millipore ABE572, 1:1,000 m°A immunoblotting); Alexa Fluor 
546 donkey anti-mouse secondary antibody (Invitrogen A10036. 1:600 IF); 
Alexa Fluor 546 donkey anti-rabbit secondary antibody (Invitrogen A10040, 
1:600 IF). 

Construction of 5’UTR reporters. The Fluc reporter with Hsp70 5’ UTR has been 
reported previously’. For Fluc reporters bearing other 5'UTRs, the following 
primers were used for 5'UTR cloning: Hsc70 (HSPA8) forward, 5'-CCCAA 
GCTTGGTCTCATTGAACGCGG-3’; reverse, 5’-CGGGATCCCCTTAGACA 
TGGTTGCTT-3’; Tubulin (TUBG2) forward, 5'’-GGCAAGCTTTGCGCCTGT 
GCTGAATTCCAGCTGC-3’; reverse, 5’-GGCGGATCCGCATCGCCGATCA 
GACCTAG-3’; Hsp105 (HSPH1) forward, 5'-CCCAAGCTTGTAAAATGCTG 
CAGATTC-3’; reverse, 5’-CGGGATCCCCACCGACATGGCTGGCCCG-3’. 
Lentiviral shRNAs. All shRNA targeting sequences were cloned into DECIPHER 
pRSI9-U6-(sh)-UbiC-TagRFP-2A-Puro (Cellecta). shRNA targeting sequences 
listed below were based on RNAi consortium at Broad Institute (http:// 
www.broad.mit.edu/rnai/trc). YTHDF2 (mouse): 5’-GCTCCAGGCATGAATA 
CTATA-3'; FTO (mouse): 5’-GCTGAGGCAGTTCTGGTTTCA-3’; Scramble 
control sequence: 5’-AACAGTCGCGTTTGCGACTGG-3’. Lentiviral particles 
were packaged using Lenti-X 293T cells (Clontech). Virus-containing supernatants 
were collected at 48h after transfection and filtered to eliminate cells. MEF cells 
were infected by the lentivirus for 48h before selection by 1 pgml * puromycin. 
Recombinant protein expression. YTHDF2 and FTO were cloned into vector 
pGEX-6P-1 using the following primers: YTHDF2 forward, 5'-ATGAATTCCC 
ATCGGCCAGCAGCCTCTTG-3’; reverse, 5’-CCGCTCGAGTTCTATTTCCC 
ACGACCTTGA-3’; FTO forward, 5'’-ATGAATTCAGCATGAAGCGCGTCC 
AGACC-3’; reverse, 5’-CCGCTCGAGCCTCTAGGATCTTGC-3’. 

The resulting clones were transfected into the Escherichia coli strain BL21 and 
expression was induced at 22 °C with 1 mM IPTG for 16-18 h. The pellet collected 
from 11 of bacteria culture was then lysed in 15ml PBS (50mM NaH>POu,, 
150mM NaCl, pH 7.2, 1mM PMSF, 1mM DTT, 1mM EDTA, 0.1% (v/v) 
Triton X-100) and sonicated for 10 min. After removing cell debris by centrifu- 
gation at 12,000 r.p.m. for 30 min, the protein extract was mixed with 2 ml equili- 
brated Pierce glutathione agarose and mixed on an end-over-end rotator for 2 h at 
4°C. The resin was washed three times with ten resin-bed volumes of equilib- 
ration/wash buffer (50mM Tris, 150mM NaCl, pH 8.0). YTHDF2 and FTO 
protein was cleaved from the glutathione agarose using PreScission Protease 
(Genscript) in cleavage buffer (50mM Tris-HCl, pH 7.0, 150 mM NaCl, 1mM 
EDTA, 1 mM DTT) at 4°C overnight. 

Immunoblotting. Cells were lysed on ice in TBS buffer (50mM Tris, pH 7.5, 
150mM NaCl, 1mM EDTA) containing protease inhibitor cocktail tablet, 1% 
Triton X-100, and 2 Uml ! DNase. After incubating on ice for 30 min, the lysates 
were heated for 10min in SDS/PAGE sample buffer (50mM Tris (pH 6.8), 
100mM_ dithiothreitol, 2% SDS, 0.1% bromophenol blue, 10% glycerol). 
Proteins were separated on SDS-PAGE and transferred to Immobilon-P mem- 
branes (Millipore). Membranes were blocked for 1 h in TBS containing 5% non-fat 
milk and 0.1% Tween-20, followed by incubation with primary antibodies over- 
night at 4°C. After incubation with horseradish-peroxidase-coupled secondary 
antibodies at room temperature for 1h, immunoblots were visualized using 
enhanced chemiluminescence (ECL!""’, GE Healthcare). 

Immunofluorescence staining. Cells grown on glass coverslips were fixed in 4% 
paraformaldehyde for 10 min at 4 °C. After permeabilization in 0.2% Triton X-100 
for 5 min at room temperature, the cover slips were blocked with 1% BSA for 1h. 
Cells were stained with indicated primary antibody overnight at 4 °C, followed 
by incubation with Alexa Fluor 546 donkey anti-mouse secondary antibody or 
Alexa Fluor 546 donkey anti-rabbit secondary antibody for 1h at room temper- 
ature. The nuclei were counter-stained with DAPI (1:1,000 dilution) for 10 min. 
Cover slips were mounted onto slides and visualized using a Zeiss LSM710 con- 
focal microscope. 
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mRNA stability measurement. Cells were treated with actinomycin D (5 1g ml‘) 
for 4h, 2h and 0h before trypsinization and collection. RNA spike-in control was 
added proportional to the total cell numbers and total RNA was isolated by TRIzol 
kit (Life Technologies). After reverse transcription, the mRNA levels of transcripts 
of interest were detected by real-time quantitative PCR. 

Real-time quantitative PCR. Total RNA was isolated by TRIzol reagent 
(Invitrogen) and reverse transcription was performed using High Capacity 
cDNA Reverse Transcription Kit (Invitrogen). Real-time PCR analysis was con- 
ducted using Power SYBR Green PCR Master Mix (Applied Biosystems) and 
carried on a LightCycler 480 Real-Time PCR System (Roche Applied Science). 
Primers for amplifying each target were: YTHDF2 forward, 5'-CAGTTTGCCT 
CCAGCTACTATT-3’; reverse, 5’-GCAATGCCATTCTTGGTCTTC-3'; FTO 
forward, 5'-TCAGCAGTGGCAGCTGAAAT-3’; reverse, 5'-CTTGGATCCTC 
ACCACGTCC-3’; Hsp70 forward, 5’-TGGTGCAGTCCGACATGAAG-3’; 
reverse, 5’-GCTGAGAGTCGTTGAAGTAGGC-3’; METTL3 forward, 5’-ATC 
CAGGCCCATAAGAAACAG-3’;_ reverse, 5’-CTATCACTACGGAAGGTTG 
GG-3'; METTL14 forward, 5’-CAGGCAGAGCATGGGATATT-3’; reverse, 5’- 
TCCGACCTGGAGACATACAT-3’; ALKBH5 forward, 5’-AGTTCCAGTTC 
AAGCCCATC-3’; reverse, 5’-GGCGTTCCTTAATGTCCTGAG-3’; WTAP for- 
ward, 5’-CTGGCAGAGGAGGTAGTAGTTA-3’; reverse, 5’-ACTGGAGTCTG 
TGTCATTTGAG-3’; B-actin forward, 5’-TTGCTGACAGGATGCAGAAG-3’; 
reverse, 5’-ACTCCTGCTTGCTGATCCACAT-3’; GAPDH forward, 5’-CAAG 
GAGTAAGAAACCCTGGAC-3’; reverse, 5'’-GGATGGAAATTGTGAGGGAG 
AT-3’; Fluc forward, 5’-ATCCGGAAGCGACCAACGCC-3’; reverse, 5’-GTCG 
GGAAGACCTGCCACGC-3’. 

In vitro transcription. Plasmids containing the corresponding 5'UTR sequences 
of mouse HSPAIA and full-length firefly luciferase were used as templates. 
Transcripts with normal m’G cap were generated using the mMessage 
mMachine T7 Ultra kit (Ambion) and transcripts with non-functional cap ana- 
logue GpppA were synthesized using MEGAscript T7 Transcription Kit 
(Ambion). To obtain mRNAs with the adenosine replaced with m°A, in vitro 
transcription was conducted in a reaction in which 5% of the adenosine was 
replaced with N°-methyladenosine. All mRNA products were purified using the 
MEGaAclear kit (Ambion) according to the manufacturer’s instructions. 

In vitro translation. In vitro translation was performed using the Rabbit 
Reticulocyte Lysate System (Promega) according to the manufacturer’s instruc- 
tions. Luciferase activity was measured using a luciferase reporter assay system 
(Promega) on a Synergy HT Multi-detection Microplate Reader (BioTek 
Instruments). 

Real-time luciferase assay. Cells grown in 35-mm dishes were transfected with 
in-vitro-synthesized mRNA containing the luciferase gene. Luciferase substrate 
p-luciferin (1 mM, Regis Tech) was added into the culture medium immediately 
after transfection. Luciferase activity was monitored and recorded using Kronos 
Dio Luminometer (Atto). 

Site-specific m°A detection. For site-specific m°A detection, DNA primers were 
first 5’ labelled with *?P using T4 polynucleotide kinase (Invitrogen) and purified 
by ethanol precipitation. The primer 5’-AGGGATGCTCTGGGGAAGGCTGG-3' 
was used to detect potential m°A site and the primer 5’-CGCCGCTCG 
CTCTGCTTCTCTTGTCTTCGCT-3’ was used to detect the non-methylated 
site. Synthesized mRNA 5'’-CGATCCTCGGCCAGG(m*°A)CCAGCCTTCCCC 
AG-3' and 5'-CGATCCTCGGCCAGGACCAGCCTTCCCCAG-3’ served as 
positive and negative control templates, respectively. To set up the reaction, a 
2 X annealing solution was prepared in a total volume of 8 il with 1 X Tth buffer 
(Promega) or AMV buffer (Invitrogen), 1 pl of each radiolabelled primer and 
10 tg MRNA from MEF cells that had been heat shock treated. The mixture was 
heated at 95 °C for 10 min and cooled slowly to room temperature. 3 1] of anneal- 
ing solution were combined with 2 ul of enzyme and heated at 37°C (AMV 
Reverse Transcriptase) or 55°C (Tth DNA Polymerase) for 2 min. After adding 
the dTTP solution (final dTTP concentration: 100 11M), the reactions were heated 
for 5min at 37°C (AMV) or 10min at 55°C (Tth). Reaction products were 
resolved on a 20% denaturing polyacrylamide gel and exposed overnight. 

RNA splint ligation. The ligation method was optimized from previous 
reports**’”"', The RNA oligonucleotide covering the 82-117nt region of 
HSPAIA was synthesized by Thermo Scientific, whereas RNA fragments corres- 
ponding to other regions were generated by in vitro transcription. For sequential 
splint ligation, two DNA bridging oligonucleotides were designed: DNA Bridge 1, 
5'-GGTCCTGGCCGAGGATCGGGAACGCGCCGCTCGCTC-3'; DNA Bridge 
2, 5'-CTCCGCGGCAGGGATGCTCTGGGGAAGGCTGGTCCT-3’. 

For 3’ RNA oligonucleotide (donor) phosphorylation, 1,1 of 204M donor 
oligonucleotide was mixed with 1 pl of 10 X PNK buffer, 6 yl of ATP (10 mM), 
0.5 pl of RNasin (20 units) and 1 pl of T4 PNK (5 units). The reaction mixture was 
incubated at 37 °C for 30 min followed by inactivation of T4 PNK at 65°C for 
20 min. Next, the DNA bridge oligonucleotide was hybridized with the 3’ RNA 
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oligonucleotide and the 5’ RNA oligonucleotide (acceptor) at a 1:1.5:2 ratio 
(5'RNA:bridge:3’RNA). Oligonucleotides were annealed (95°C for 1 min followed 
by 65 °C for 2 min and 37 °C for 10 min) in the presence of 1 X T4 DNA dilution 
buffer. To ligate the 5’ and the 3’ RNA together, T4 DNA ligase and the T4 DNA 
ligation buffer were added and the reaction mixture was incubated at 37 °C for 1 h. 
The ligation was stopped by adding 1 pl of 0.5M EDTA followed by phenol- 
chloroform extraction and ethanol precipitation. Ligation products were analysed 
by 10% TBE-Urea gels or formaldehyde gels. The expected RNA ligation products 
in TBE-Urea gels were eluted in RNA gel elution buffer (300 mM NaOAc pH5.5, 
1mMEDTAand0.1 U pl’ SUPERase_In) followed by ethanol precipitation. The 
final products in formaldehyde gels were isolated by Zymoclean Gel RNA 
Recovery Kit (Zymo Research). 

Hsp70 mRNA pull-down and m°A immunoblotting. To isolate endogenous 
Hsp70 mRNA, 400 pmol of biotin-labelled probe (5'-TTCATAACATATCTCT 
GTCTCTT-3’) was incubated with of 2 mg M-280 Streptavidin Dynabeads (Life 
Technologies) in 1 ml 1 X B & W buffer (5 mM Tris-HCL pH 7.5, 0.5 mM EDTA 
and 1 M NaCl) at 4°C for 1h. 2 mg total RNA was denatured at 75°C for 2 min 
and added to the pre-coated Dynabeads for an additional incubation of 2 h at 4 °C. 
Captured RNA was eluted by heating beads for 2 min at 90°C in 10mM EDTA 
with 95% formamide followed by TRIzol LS isolation. Isolated RNA was quan- 
tified using NanoDrop ND-1000 UV-Vis Spectrophotometer and equal amounts 
of RNAs were mixed with 2 X RNA Loading Dye (Thermo Scientific) and dena- 
tured for 3 min at 70°C. In-vitro-transcribed mRNA containing 50% N°-methy- 
ladenosine or 100% adenosine was used as positive and negative control, 
respectively. Samples were separated on a formaldehyde denaturing agarose gel 
and transferred to a positively charged nylon membrane by siphonage in transfer 
buffer (10 mM NaOH, 3 M NaC)) overnight at room temperature. After transfer, 
the membrane was washed for 5 min in 2 X SSC buffer and RNA was UV cross- 
linked to the membrane. Membrane was blocked for 1 h in PBST containing 5% 
non-fat milk and 0.1% Tween-20, followed by incubation with anti-m°A antibody 
(1:1,000 dilution) for overnight at 4 °C. After extensive washing with 0.1% PBST 
three times, the membrane was incubated with HRP-conjugated anti-rabbit IgG 
(1:5,000 dilution) for 1h. Membrane was visualized by using enhanced chemi- 
luminescence (ECLPlus, GE Healthcare). 

YTHDF2 and FTO in vitro pull down. Synthesize mRNA (100 pmol) with a 
single m°A at A103 was label by biotin using the Pierce RNA 3’ End 
Desthiobiotinylation Kit. Binding of the labelled RNA to streptavidin magnetic 
beads was performed in RNA capture buffer (20mM Tris, pH 7.5, 1M NaCl, 
1mM EDTA) for 30 min at room temperature with rotation. The RNA-protein 
binding reaction was conducted in protein-RNA binding buffer (20 mM Tris (pH 
7.5), 50M NaCl, 2mM MgCl, 0.1% Tween-20 Detergent) at 4 °C for 60 min with 
rotation. After washing three times with the wash buffer (20mM Tris pH 7.5, 
10mM NaCl, 0.1% Tween-20 Detergent), protein was eluted by Biotin Elution 
Buffer (Pierce) and detected by western blot. 

YTHDEF2 and FTO in vitro competition assay. The YTHDF2 and FTO in vitro 
competition assay was performed in 100 ll of reaction mixture containing 5 1M 
RNA incorporated with 50% m°A, 283 uM of (NH4)2Fe(SO4)2‘6H20, 300 11M of 
a-KG, 2 mM of L-ascorbic acid, 50 Lg ml! of BSA, and 50 mM of HEPES buffer 
(pH 7.0). The reaction was incubated for 3 h at room temperature, and quenched 
by adding 5 mM EDTA followed by heating for 5 min at 95 °C. RNA was isolated 
by TRIzol LS and quantified using NanoDrop ND-1000 UV-Vis Spectrophoto- 
meter. Equal amounts of RNA were used for dot blotting and methylene blue 
staining was used to show the amount of RNA on hybridization membranes. 
Polysome profiling analysis. Sucrose solutions were prepared in polysome buffer 
(10mM HEPES, pH 7.4, 100 mM KCl, 5mM MgCh, 100 pg ml! cycloheximide 
and 2% Triton X-100). A 15%-45% (w/v) Sucrose density gradients were freshly 
prepared in SW41 ultracentrifuge tubes (Backman) using a Gradient Master 
(BioComp Instruments). Cells were pre-treated with 100 1g ml“! cycloheximide 
for 3 min at 37 °C followed by washing using ice-cold PBS containing 100 1g ml" 
cycloheximide. Cells were then lysed in polysome lysis buffer. Cell debris were 
removed by centrifugation at 14,000 r.p.m. for 10 min at 4°C. 500 il of supernat- 
ant was loaded onto sucrose gradients followed by centrifugation for 2 h 28 min at 
38,000 r.p.m. 4 °C in a SW41 rotor. Separated samples were fractionated at 0.75 ml 
min”! through an automated fractionation system (Isco) that continually mon- 
itores OD54 values. An aliquot of ribosome fraction were used to extract total 
RNA using Trizol LS reagent (Invitrogen) for real-time PCR analysis. 

RNA-seq and m°A-seq. For m°A immunoprecipitation, total RNA was first iso- 
lated using TRIzol reagent followed by fragmentation using freshly prepared RNA 
fragmentation buffer (10 mM Tris-HCl, pH 7.0, 10 mM ZnCl,). 5 ug fragmented 
RNA was saved as input control for RNA-seq. 1 mg fragmented RNA was incu- 
bated with 15 ig anti-m°A antibody (Millipore ABE572) in 1 X IP buffer (10mM 
Tris-HCl, pH 7.4, 150mM NaCl, and 0.1% Igepal CA-630) for 2h at 4°C. The 
m°A-IP mixture was then incubated with Protein A beads for additional 2 h at 4°C 


on a rotating wheel. After washing three times with IP buffer, bound RNA was 
eluted using 100 jl elution buffer (6.7 mM N°-Methyladenosine 5'-monophosphate 
sodium salt in 1 X IP buffer), followed by ethanol precipitation. Precipitated 
RNA was used for cDNA library construction and high-throughput sequencing 
described below. 

Ribo-seq. Ribosome fractions separated by sucrose gradient sedimentation were 
pooled and digested with E. coli RNase I (Ambion, 750 U per 100 A260 units) by 
incubation at 4 °C for 1 h. SUPERase inhibitor (50 U per 100 U RNase I) was then 
added into the reaction mixture to stop digestion. Total RNA was extracted using 
TRIzol reagent. Purified RNA was used for cDNA library construction and high- 
throughput sequencing described below. 

cDNA library construction. Fragmented RNA input and m®°A-IP elutes were 
dephosphorylated for 1h at 37 °C in 15 ul reaction (1 X T4 polynucleotide kinase 
buffer, 10 U SUPERase_In and 20 U T4 polynucleotide kinase). The products were 
separated on a 15% polyacrylamide TBE-urea gel (Invitrogen) and visualized using 
SYBR Gold (Invitrogen). Selected regions of the gel corresponding to 40-60 nt (for 
RNA-seq and m®°A-seq) or 25-35 nt (for Ribo-seq) were excised. The gel slices 
were disrupted by using centrifugation through the holes at the bottom of the tube. 
RNA fragments were dissolved by soaking overnight in 400 1] gel elution buffer 
(300 mM NaOAc, pH 5.5, 1 mM EDTA, 0.1 U pl | SUPERase_In). The gel debris 
was removed using a Spin-X column (Corning), followed by ethanol precipitation. 
Purified RNA fragments were re-suspended in nuclease-free water. Poly(A) tailing 
reaction was carried out for 45 min at 37 °C (1 X poly(A) polymerase buffer, 1 mM 
ATP, 0.75 U pl’ SUPERase_ In and 3 U E. coli poly(A) polymerase). 

For reverse transcription, the following oligonucleotides containing barcodes 
were used: MCA02, 5'-pCAGATCGTCGGACTGTAGAACTCT@CAAGCAGA 
AGACGGCATACGATTTTTTTTTTTITTTTTTITVN-3’; LGT03, 5’-pGTG 
ATCGTCGGACTGTAGAACTCT@CAAGCAGAAGACGGCATACGATTTT 
TTTTTTTTTTTITTITITVN-3’; YAGO4, 5’-pAGGATCGTCGGACTGTAGA 
ACTCT@CAAGCAGAAGACGGCATACGATTTTTTTTTTTTTTTTTTTTV- 
N-3'; HTC05, 5'-pTCGATCGTCGGACTGTAGAACTCT@CAAGCAGAAGA 
CGGCATACGATTTTTTTTTTTTITTTTTTTTVN-3’. 

In brief, the tailed-RNA sample was mixed with 0.5mM dNTP and 2.5mM 
synthesized primer and incubated at 65 °C for 5 min, followed by incubation on ice 
for 5 min. The following was then added to the reaction mix: 20 mM Tris (pH 8.4), 
50mM KCl, 5mM MgCh, 10 mM DTT, 40 U RNaseOUT and 200 U SuperScript 
III. The reverse-transcription reaction was performed according to the manufac- 
turer’s instruction. Reverse-transcription products were separated on a 10% poly- 
acrylamide TBE-urea gel as described earlier. The extended first-strand product 
band was expected to be approximately 100 nt, and the corresponding region was 
excised. The cDNA was recovered by using DNA gel elution buffer (300 mM NaCl, 
1mM EDTA). First-strand cDNA was circularized in 20 ll of reaction containing 
1 X CircLigase buffer, 2.5mM MnCl, 1M Betaine, and 100U CircLigase II 
(Epicentre). Circularization was performed at 60°C for 1h, and the reaction 
was heat-inactivated at 80°C for 10min. Circular single-strand DNA was 
re-linearized with 20 mM Tris-acetate, 50 mM potassium acetate, 10 mM magnes- 
ium acetate, 1 mM DTT, and 7.5 U APE 1 (NEB). The reaction was carried out at 
37°C for 1h. The linearized single-strand DNA was separated on a Novex 10% 
polyacrylamide TBE-urea gel (Invitrogen) as described earlier. The expected 100- 
nt product bands were excised and recovered as described earlier. 

Deep sequencing. Single-stranded template was amplified by PCR by using the 
Phusion High-Fidelity enzyme (NEB) according to the manufacturer’s instruc- 
tions. The oligonucleotide primers qNTI200 (5’-CAAGCAGAAGACGGCATA- 
3') and qNTI201 (5'-AATGATACGGCGACCACCGACAGGTTCAGAGTTC 
TACAGTCCGACG- 3’) were used to create DNA suitable for sequencing, that 
is, DNA with Illumina cluster generation sequences on each end and a sequencing 
primer binding site. The PCR contains 1 X HF buffer, 0.2mM dNTP, 0.5 1M 
oligonucleotide primers, and 0.5 U Phusion polymerase. PCR was carried out with 
an initial 30 s denaturation at 98 °C, followed by 12 cycles of 10s denaturation at 
98°C, 20s annealing at 60°C, and 10s extension at 72°C. PCR products were 
separated on a non-denaturing 8% polyacrylamide TBE gel as described earlier. 
Expected DNA at 120bp (for Ribo-seq), or 140 bp (for RNA-seq and m°A-seq) 
was excised and recovered as described earlier. After quantification by Agilent 
BioAnalyzer DNA 1000 assay, equal amounts of barcoded samples were pooled 
into one sample. Approximately 3-5 pM mixed DNA samples were used for 
cluster generation followed by deep sequencing using sequencing primer 5’- 
CGACAGGTTCAGAGTTCTAC AGTCCGACGATC-3’ (Illumina HiSeq). 

Preprocessing of sequencing reads. For Ribo-seq, the sequencing reads were first 
trimmed by 8 nt from the 3’ end and trimmed reads were further processed by 
removing the adenosine (A) stretch from the 3’ end (one mismatch was allowed). 
The processed reads between 25 nt and 35 nt were first mapped by Tophat using 
parameters (--bowtiel -p 10 --no-novel-juncs) to mouse transcriptome (UCSC 
Genes)”. The unmapped reads were then mapped to corresponding mouse 
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genome (mm10). Non-uniquely mapped reads were disregarded for further ana- 
lysis owing to ambiguity. The same mapping procedure was applied to RNA-seq 
and m°A-seq. For Ribo-seq, the 13th position (12 nt offset from the 5’ end) of the 
uniquely mapped read was defined as the ribosome ‘P-site’ position. The RPF 
density was computed after mapping uniquely mapped reads to each individual 
mRNA transcript according to the NCBI Refseq gene annotation. Uniquely 
mapped reads of RNA-seq and Ribo-seq in the mRNA coding region were used 
to calculate the RPKM values for estimating mRNA expression and translation 
levels respectively. For m°A-seq, uniquely mapped reads in the 5’UTR were used 
to calculate the RPKM values for estimating the m°A levels. 

Identification of m®°A sites. We used a similar scanning strategy reported prev- 
iously to identify m°A peaks in the immunoprecipitation sample as compared to 
the input sample’. In brief, for NCBI RefSeq genes whose maximal read coverage 
was greater than 15 in the input (RNA-seq), a sliding window of 80 nucleotides 
with step size of 40 nucleotides was employed to scan the longest isoform (on the 
basis of coding sequence (CDS) length; in the case of equal CDS, the isoform with 
longer 5'UTR was selected). For each window, a peak-over-median score (POM) 
was derived by calculating the ratio of mean read coverage in the window to the 
median read coverage of the whole gene body. Windows scoring higher than 3 in 
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the IP sample were obtained and all the resultant overlapping m°A peak windows 
in the IP sample were iteratively clustered to infer the boundary of the m°A- 
enriched region, as well as peak position with maximal read coverage. Finally, a 
peak-over-input (POI) score was assigned to each m°A-enriched region by cal- 
culating the ratio of POM in the IP sample to that in the input sample. A putative 
m°A site was defined if the POI score was higher than 3. The peak position of each 
m°A site was classified into five mutually exclusive mRNA structural regions 
including TSS (the first 200 nucleotides of mRNA), 5'UTR, CDS, stop codon (a 
400 nt window flanking the mRNA stop codon) and 3'UTR. 

m°A motif analysis. The m°A peaks with POI score higher than 10 were selected 
for consensus motif finding. We used MEME Suite for motif analysis*. In brief, the 
flanking sequences of m°A peaks (+40 nt) with POI scores were retrieved from 
mouse transcritpome and were used as MEME input. 


31. Maroney, P.A., Chamnongpol, S., Souret, F. & Nilsen, T. W. Direct detection of small 
RNAs using splinted ligation. Nature Protocols 3, 279-287 (2008). 

32. Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with 
RNA-seq. Bioinformatics 25, 1105-1111 (2009). 

33. Bailey, T. L. et al, MEME SUITE: tools for motif discovery and searching. Nucleic 
Acids Res. 37, W202-W208 (2009). 
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Extended Data Figure 1 | Subcellular localization of the m°A machinery in (right panel) were subject to heat shock stress (42 °C, 1h) followed by recovery 
cells before and after heat shock stress. a, MEF cells before or 2h after at 37 °C for various times. Anti-YTHDF2 immunostaining was counterstained 
heat shock (HS; 42 °C, 1h) were immunostanied with indicated antibodies. by DAPI. Images are representative of at least 50 cells. Bar, 10 jum. 

DAPI was used for nuclear staining. b, MEF (left panel) and HeLa cells 
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Extended Data Figure 2 | mRNA stability and induction in responsetoheat _ Relative levels of indicated transcripts are normalized to f-actin. Error bars, 
shock stress. a, Effects of heat shock stress on mRNA stability. MEF cells mean + s.e.m.; 1 = 3 biological replicates. c, HSF1 wild-type (WT) and 
without heat shock stress (No HS), immediately after heat shock stress knockout (KO) cells were subject to heat shock stress (42 °C, 1h) followed by 


(42 °C, 1h) (Post HS 0h), or 2h recovery at 37 °C (Post HS 2h) were subjectto —_ recovery at 37 °C for various times. Real-time PCR was conducted to quantify 
further incubation in the presence of 5 ugml~' ActD. At the indicated times, _ transcripts encoding Hsp70 and YTHDF2. Relative levels of transcripts are 
mRNA levels were determined by qPCR. Error bars, mean + s.e.m.; n = 3 normalized to B-actin. Error bars, mean + s.e.m.; *P < 0.05, **P < 0.01, 
biological replicates. b, MEF cells were collected at indicated times after heat unpaired two-tailed t-test; n = 3 biological replicates. 

shock stress (42 °C, 1h) followed by RNA extraction and real-time PCR. 
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Extended Data Figure 3 | Characterization of m°A sites in MEF cells with or 
without heat shock stress. a, b, m°A profiling was conducted on MEF cells 
before (a) or 2h after (b) heat shock (42 °C, 1h). Left, pie chart presenting 
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fractions of m°A peaks in different transcript segments. Right, sequence logo 
representing the consensus motif relative to m°A. CDS, coding sequence 


region. 
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Extended Data Figure 4 | m°A profiling of HSPA8 in MEF cells with or 
without heat shock stress. An example of constitutively expressed transcript 
HSPA8 in MEE cells with or without heat shock stress. Coverage of m°A 
immunoprecipitation and control reads (input) are indicated in red and grey, 
respectively. The transcript architecture is shown below the x axis. 
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Extended Data Figure 5 | Dynamic m°A modification of HSPAI1A by 
YTHDF2 and FTO. An example of stress-induced transcript HSPA1A in post- 
stressed MEF cells with either YTHDF2 or FTO knockdown. Coverage of m°A 
immunoprecipitation and control reads (input) are indicated in red and 
blue, respectively. The transcript architecture is shown below the x axis. 
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Extended Data Figure 6 | Direct competition between YTHDF2 and FTO mRNA with m°A was incubated with FTO (1 pg in top panel and 2 yg in 

in m°A binding. a, Synthesized mRNA with m°A was incubated with bottom panel) in the absence of presence of YTHDF2 (4 jig), followed by m°A 
FTO (2 jug) in the presence of an increasing amount of YTHDF2 (0,0.5,land _ dot blotting. 

2 ug), followed by RNA pull-down and immunoblotting. b, Synthesized 
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Extended Data Figure 7 | YTHDEF2 knockdown does not affect Hsp70 
transcription after stress. MEF cells with or without YTHDF2 knockdown 
were subject to heat shock stress (42 °C, 1h) followed by recovery at 37 °C 


for various times. Real-time PCR was conducted to quantify Hsp70 mRNA 
levels. Error bars, mean + s.e.m.; n = 3 biological replicates. sh-Scram, 


scrambled shRNA. 
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Extended Data Figure 8 | FTO knockdown promotes Hsp70 synthesis. 


a,m°A blotting of purified HSPA1A in MEF with or without FTO knockdown. 


Messenger RNAs synthesized by in vitro transcription in the absence or 
presence of m°A were used as control. RNA staining is shown as loading 
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control. Representative of two biological replicates. b, MEF cells with or 
without FTO knockdown were collected at indicated times after heat shock 
stress (42 °C, 1h) followed by immunoblotting using antibodies indicated. 
N, no heat shock. Representative of three biological replicates. 
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Extended Data Figure 9 | m°A modification promotes cap-independent analogue ApppG. Fluc activity in transfected MEF cells was recorded using 
translation. a, Fluc reporter mRNAs with or without 5’UTR was synthesized _ real-time luminometry. c, Constructs expressing Fluc reporter bearing 5’UTR 
in the absence or presence of m°A. The transfected MEFs were incubation in from Hsc70 or Hsp105 are depicted on the top. Fluc activities in transfected 
the presence of 5 1g ml ActD. At the indicated times, mRNA levels were MEF cells were quantified and normalized to the control containing normal A. 
determined by qPCR. Error bars, mean + s.e.m.; n = 3 biological replicates. Error bars, mean + s.e.m.; *P < 0.05, unpaired two-tailed t-test; n = 3 

b, Fluc reporter mRNAs with or without Hsp70 5’UTR was synthesized in the _ biological replicates. 

absence of presence of m°A, followed by addition of a non-functional cap 
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Extended Data Figure 10 | Site-specific detection of m°A modification on 
HSPA1A. a, Sequences of HSPA1A template and the DNA primer used for 
site-specific detection. Synthesized mRNAs containing a single m°A site (red) 
or A (blue) are used as positive and negative controls, respectively. The red 
shading in the HSPAIA sequence indicates predicted m®A sites. 


Autoradiogram shows primer extension of controls (left panel) and 
endogenous HSPAIA (right panel). b, Fluc mRNAs with or without m°A 
incorporation were incubated in the rabbit reticulocyte lysate system (RRL) at 
30 °C for up to 60 min. Messenger RNA levels were determined by qPCR. Error 
bars, mean + s.e.m.; n = 3 biological replicates. 
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CORRECTIONS & AMENDMENTS 


ERRATUM 
doi:10.1038/nature15255 


Erratum: Genetic diversity and 
evolutionary dynamics of Ebola 


virus in Sierra Leone 


Yi-Gang Tong, Wei-Feng Shi, Di Liu, Jun Qian, Long Liang, 
Xiao-Chen Bo, Jun Liu, Hong-Guang Ren, Hang Fan, Ming Ni, 
Yang Sun, Yuan Jin, Yue Teng, Zhen Li, David Kargbo, 

Foday Dafae, Alex Kanu, Cheng-Chao Chen, Zhi-Heng Lan, 
HuiJiang, Yang Luo, Hui-Jun Lu, Xiao-Guang Zhang, Fan Yang, 
Yi Hu, Yu-Xi Cao, Yong-Qiang Deng, Hao- Xiang Su, Yu Sun, 
Wen-Sen Liu, Zhuang Wang, Cheng-Yu Wang, Zhao- Yang Bu, 
Zhen-Dong Guo, Liu-Bo Zhang, Wei-Min Nie, Chang- Qing Bai, 
Chun-Hua Sun, Xiao-Ping An, Pei-Song Xu, 

Xiang-Li-Lan Zhang, Yong Huang, Zhi-Qiang Mi, Dong Yu, 
Hong-Wu Yao, Yong Feng, Zhi-Ping Xia, Xue-Xing Zheng, 
Song-Tao Yang, Bing Lu, Jia-Fu Jiang, Brima Kargbo, 

Fu-Chu He, George F. Gao, Wu-Chun Cao 

& The China Mobile Laboratory Testing Team in Sierra Leone 


Nature 524, 93-96 (2015); doi:10.1038/nature14490 


This Letter should have contained an associated Creative Commons 
statement in the Author Information section. In addition, the Fig. 3c 
legend should have stated that the bar chart was adapted, with per- 
mission, from Ebola response roadmap - Situation report, Figure 3; 
http://www.who.int/csr/disease/ebola/situation-reports/en (accessed 
1 April 2015)’. These issues have now both been corrected in the 
online versions of the paper. 


1. World Health Organization. Ebola response roadmap - Situation report. http:// 
www.who.int/csr/disease/ebola/situation-reports/en (accessed 1 April 2015). 
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CORRECTIONS & AMENDMENTS 


ERRATUM 
doi:10.1038/nature15704 


Erratum: Evidence for human 
transmission of amyloid-f 
pathology and cerebral amyloid 
angiopathy 

Zane Jaunmuktane, Simon Mead, Matthew Ellis, 

Jonathan D. F. Wadsworth, Andrew J. Nicoll, Joanna Kenny, 
Francesca Launchbury, Jacqueline Linehan, 


Angela Richard-Loendt, A. Sarah Walker, Peter Rudge, 
John Collinge & Sebastian Brandner 


Nature 525, 247-250 (2015); doi:10.1038/nature15369 


In this Letter, an administrative error led to the publication of an 
incorrect version of the Competing Financial Interests (CFI) state- 
ment. Although the published CFI statement did reference the 
authors’ affiliation with D-Gen, it did not contain all of the informa- 
tion provided by the authors about the interests of the company. The 
CFI statement for this paper as originally published was “J.C. is a 
Director and J.C. and J.D.F.W. are shareholders of D-Gen Limited, 
which supplies antibody ICSM35.” The updated CFI statement is “J.C. 
is a Director and J.C. and J.D.F.W. are shareholders of D-Gen Limited, 
an academic spin-out company working in the field of prion disease 
diagnosis, decontamination and therapeutics. D-Gen supplied anti- 
body ICSM35.” 


22 OCTOBER 2015 | VOL 526 | NATURE | 595 
©2015 Macmillan Publishers Limited. All rights reserved 


CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature15253 


Corrigendum: Lanosterol reverses 


protein aggregation in cataracts 


Ling Zhao, Xiang-Jun Chen, Jie Zhu, Yi-Bo Xi, Xu Yang, 
Li-Dan Hu, Hong Ouyang, Sherrina H. Patel, Xin Jin, Danni Lin, 
Frances Wu, Ken Flagg, Huimin Cai, Gen Li, Guiqun Cao, 

Ying Lin, Daniel Chen, Cindy Wen, Christopher Chung, 
Yandong Wang, Austin Qiu, Emily Yeh, Wenqiu Wang, Xun Hu, 
Seanna Grob, Ruben Abagyan, Zhiguang Su, 

Harry Christianto Tjondro, Xi-Juan Zhao, Hongrong Luo, 

Rui Hou, J. Jefferson P. Perry, Weiwei Gao, Igor Kozak, 

David Granet, Yingrui Li, Xiaodong Sun, Jun Wang, 

Liangfang Zhang, Yizhi Liu, Yong-Bin Yan & Kang Zhang 


Nature 523, 607-611 (2015); doi:10.1038/nature14650 


In this Letter, author Yong-Bin Yan was incorrectly associated with 
affiliation number 5 (Department of Ophthalmology, Xijing 
Hospital) instead of affiliation number 4 (State Key Laboratory of 
Membrane Biology, School of Life Sciences, Tsinghua University, 
Beijing 100084, China). Also, an additional affiliation has been added 
to author Kang Zhang (number 15; Institute of Molecular Medicine, 
Peking University, Beijing 100871, China), and affiliation number 
3 has changed from ‘Department of Ophthalmology and Bio- 
materials and Tissue Engineering Center’ to ‘Shiley Eye Institute 
and Biomaterials and Tissue Engineering Center’. These have all been 
corrected in the online versions of the paper. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature15370 


Corrigendum: Selective killing of 
cancer cells by a small molecule 


targeting the stress response to ROS 


Lakshmi Raj, Takao Ide, Aditi U. Gurkar, Michael Foley, 
Monica Schenone, Xiaoyu Li, Nicola J. Tolliday, Todd R. Golub, 
Steven A. Carr, Alykhan F. Shamji, Andrew M. Stern, 

Anna Mandinova, Stuart L. Schreiber & Sam W. Lee 


Nature 475, 231-234 (2011); doi:10.1038/nature10167 
corrigendum Nature 481, 534 (2012); doi:10.1038/nature10789 


In this Letter, we presented findings from experiments using the EJ 
bladder xenograft cancer model, in which some tumours on some of 
the animals exceeded the maximum size (15 mm in any dimension) 
permitted by the Institutional Animal Care and Use Committee 
(IACUC) at Massachusetts General Hospital (MGH). Therefore, we 
withdraw the data presented in Supplementary Fig. 9b, as well as in 
Fig. 2 from the first Corrigendum. Although other tumours were 
found to exceed the permitted maximum, owing to the degree of 
tumour size and the circumstances of the experimental procedures, 
this was considered acceptable as detailed below. 

The tumours in eight mice in Fig. 2b, d (same mice in Fig. la, b in 
the Corrigendum) also exceeded the tumour size approved in the 
IACUC protocol of the principal investigator (PI), and were eutha- 
nized 48-72 h after the tumour burden was identified. Although the 
tumour size limit permitted by the protocol had been reached in the 
above-mentioned animals, the mice exhibited no other clinical signs 
associated with humane endpoints due to pain or distress. Given the 
lack of clinical signs observed and the timing of the clinical presenta- 
tion, the PI was permitted to maintain the animals for this short time 
period to support proper tumour collection and fixation. The animals 
were regularly monitored during this time period. We also noted that 
we made an error in the first Corrigendum when calculating the 
volumes for the last time point of Fig. 1b. All measurements are 
now presented as Supplementary Data. 

The in vivo experiments described in Supplementary Fig. 9 were 
performed between 2007 and 2008. At this time, the IACUC-approved 
protocol of the PI required daily monitoring of the tumour-bearing 
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animals for clinical signs of distress, and did not require daily 
measurements of the tumour size. Measurements of tumour sizes were 
performed at the indicated time points. Consistent with the IACUC- 
approved guidelines, all animals were euthanized as soon as measure- 
ments indicated that the tumours reached the size limit approved in the 
protocol (15mm). These mice did not show clinical signs of distress, 
and thus it only became apparent that the tumours had already 
exceeded 15mm when they were measured. Since the completion of 
this study, the [ACUC-approved protocol of the PI was revised, and 
daily measurements of tumour sizes, in addition to daily observation, 
are now required. For the xenograft tumour models, measurements 
were performed on the entire tumour lesion including cases when 
tumours appeared as aggregates of single nodules (melanoma model 
in Supplementary Fig. 9e). All measurements are now presented as 
Supplementary Data. 

The MGH IACUC approved the animal models and the general 
procedures for drug testing used in this work; however, some com- 
pounds and cell lines used for the experiments in Fig. 2 and 
Supplementary Fig. 9 had not received prospective IACUC 
approval owing to an administrative oversight. All of the materials 
included in the original Letter and the earlier Corrigendum have 
since been approved by the IACUC for subsequent experiments. 
We would like to clarify that the detection of total p53 in Fig. 4c was 
performed after stripping the membrane following detection of 
phosphorylated p53 (Ser15 p-p53). In addition, the original paper 
incorrectly stated that the error bars in Figs 1c, 3a—d, 4a, b, and 
Supplementary Figs 2b, 3a, 4a, 6, 15c, 18, 20, 21, 26b and 29b were 
calculated based on three independent experiments. These graphs 
and the associated error bars represent the results of technical tri- 
plicates from one experiment. The error bars in the graphs through- 
out the main paper and Supplementary Information represent the 
standard deviation of the mean. The IACUC has reviewed all the 
data now presented, and confirms that the statements provided in 
this Corrigendum are accurate. Corrective measures have since 
been taken to avoid any irregularities happening again. Although 
the scientific conclusions of the original paper stand, we would like 
to apologize for the numerous inaccuracies in reporting our data, 
and for the breach of animal welfare guidelines in some of the 
original data. 


Supplementary Information is linked to the online version of the paper at 
www.nature.com/nature. 
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Ua SCIENCE FICTION 


BY JENNIFER CAMPBELL-HICKS 


urelia’s Dad stood on the basement 
Az his electrically charged hair 

bristled like porcupine quills. Behind 
him stood three more Dads who were exact 
copies of the first. 

“Aurelia, thank goodness,’ said the Dads. 
“Tm still at —” 

“The movie with Mom? Yep,’ 

“Good. I used my time machine to come 
back and warn you. Tonight, before Mom 
and I get home, you will vanish without a 
trace.” The Dad in front grabbed her shoul- 
ders. “Fate is not set. I'm here to change it. 
And stop biting your nails.” 

She snatched her fingers from her mouth. 
It was a coping mechanism. Senior year, 
harder classes, applying to colleges. There 
were no strategies for coping with this. 

“You're 11A, she said. “11B, 11C, 11D” 

They looked confused. “What?” 

“Just follow me.” 

She led the four Dads to the living room. 
Six more copies sat on the leather couch, the 
love seat and the recliner. One leaned against 
the wall. They looked anxious. One Dad — 
3A, she thought — clicked the remote at the 
television. 

“What's this?” the 11s asked. 

“What do you think? Your machine is 
broken. It’s spitting you out, over and over. 
You're coming out in groups so you always 
add up to a prime number. We had seven. 
Now it’s eleven.” She was proud that she had 
seen the pattern and gave credit to four years 
on the school maths team. 

“Primes? Why?” 

“How should I know? It’s your stupid 
machine. And before you ask, because 
you ve already asked, there are 10 of you here 
because 1A is in the basement trying to shut 
down the machine.” Her gaze slid over them. 
“You should have listened to Mom. She said 
it was a bad idea to mess with the space-time 
continuum.” 

Knocking shook the basement door. 

“What was that?” a Dad asked. 

“You,” Aurelia said and gnawed at her 
thumbnail. “Have a seat. If there’s one left.” 

Back in the kitchen, she opened the base- 
ment door. This time, as she expected, two 
Dads stood on the steps. 

“Aurelia, thank goodness —” 

“I know. I vanished. You're here to save me.” 
She pointed. “13A and 13B. Come with me” 

If this continued, dozens of Dads might 
be all over the house when her parents got 


PRIME TIME 


Parental control. 


home. She ran the numbers: 17, 19, 23, 29, 
31,37, 41, 43... 
“What's this?” the 13s asked in the living 


room. 

“The others will explain. I need a volun- 
teer? 

“Me,’ they all said. 

She pointed. “Who are you?” 

“T1B? 

“Come on.” 

In the basement, the time machine 
hummed and clicked. It was an 8-foot 
cube of metal, gears, wires and pulsating 
light with an open hatch. Soon, if nothing 
changed, four more Dads would emerge 
from the hatch. The timing wasn’t regular. 
They could appear 10 seconds from now or 
10 minutes. Dad 1A peered out from behind 
the machine, grease on his cheeks. 

“There's something wrong, obviously,’ he 
said. “I can't shut off the power.” 

“Tt ll keep spitting you out?” Aurelia asked. 

He nodded. 

“For how long? Eternity?” 

“Or until the space-time continuum 
overloads. Can it do that? I don't know.” He 
shook his head. “I should have listened to 
your mother. I’m sorry.’ 

He looked so sad that Aurelia hugged him. 

“T wish we had more time,” 11B said. 

That gave Aurelia an idea. “We do have 
time. I can jump forward and tell you not to 
jump back.” Then she thought of something 
else. “Wait. That must be how I vanished in 
the first place, which was why you came back 
to save me.” 

11B stepped forward. “Ill go” 

“We can’t meet ourself like that; said 1A. 
“Paradox risk.” 

Aurelia chewed on two fingernails at 
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once. Was this how the apocalypse started? 
The world became overrun by crazy-haired 
clones of her Dad? Then she remembered 
what he had said on the basement steps. Fate 
was not set. They could change it. 

“You cant shut off the machine,’ she said. 
“Can you send it to another time?” 

“I don't know,’ Dad 1A said, sounding 
dubious, but Dad 11B grew excited. “We can 
program it so it’s always five minutes from 
the present! If it’s always five minutes in our 
future, it will never overrun our now with 
clones.” 

“And if I don't go to warn you,” Aurelia 
added, “I won't disappear, which means you 
wont come here at all” 

It was perfect. They could solve the prob- 
lem before it started, but Dad 1A’s shoulders 
slumped. 

“If we send the machine to the future, I'll 
lose it. All those years of work will be for 
nothing” 

‘Tm sorry,’ Aurelia said. “It doesn’t work 
right, anyway. The alternative is Dad clones 
coming out in primes into infinity. The Earth 
can't handle that.’ 

He nodded. The two Dads got to work 
with the machine's controls while Aurelia 
watched with her hands stuck firmly under 
her armpits so she wouldnt chew on her 
nails. Then they said: “Ready to rock and 
roll. Be good, kiddo” 

They pressed a button. The machine 
hummed louder, glowed brighter. Aurelia 
squeezed her eyes shut. When she opened 
them, the machine was gone, leaving a 
square of clean concrete floor amid the dust. 
The Dads were gone, too. 

She sprinted upstairs. The living room 
was empty, thank goodness. Then a key 
turned in the front door. Aurelia’s fingers 
rose to her mouth, but she stopped and put 
her hands at her sides. She could handle this. 
The door swung open. Dad walked in with 
Mom behind him, both in nice date-night 
clothes. 

“Aurelia,” Dad said. “It’s a school night. 
Why aren't you in bed?” 

She took a deep breath. “Dad, do you have 
time to talk?” m 


Jennifer Campbell-Hicks is a writer, 
journalist, wife, mother and lifelong fan of 
science fiction and fantasy. Her fiction has 
appeared in Daily Science Fiction, Flash 
Fiction Online and Intergalactic Medicine 
Show. She blogs at jennifercampbellhicks. 
blogspot.com. 
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indau is a tiny island, covering two-thirds of a square 

kilometre. For most of the year, its medieval streets 

are filled with tourists. For one week in the summer, 
however, the Bavarian town becomes home to an exchange 
of ideas that circulate beyond its limits and, in the future, 
could stretch even further. This is because Lindau is home 
to the Nobel laureate meetings where past prize winners 
meet young researchers hoping to pick up wherever their 
illustrious mentors leave off. 

The annual event began in 1951, two years before James 
Watson and Francis Crick published the structure of DNA. In 
2015, it welcomed Elizabeth Blackburn (see page $56), who 
studies the genetic material at the ends of chromosomes, and 
Richard Roberts (S58), who found that genes often contain 
non-protein-coding portions that facilitate the creation of 
many different proteins from a single gene. 

Similarly, by the first Lindau meeting, physicists had not 
unified electromagnetism and nuclear weak forces. One of this 
year’s guests included Francois Englert (S61), who imagined 
the Higgs boson. Laureates Bruce Beutler (S59) and Susumu 
Tonegawa (S55) — specialists in cellular immunity and 
emotional memories, respectively — are also furthering fields 
that barely existed in 1951. This Outlook illuminates all of these 
Nobel prize winners’ discoveries. An in-depth look at super- 
resolution microscopy — the subject that opened this year’s 
Lindau meeting — describes the impact that the technology is 
having on molecular and cell biology (S50). This feature and 
the Q&A with Blackburn are accompanied by an animation; 
you can also hear the Q&As with Tonegawa and Beutler asa 
podcast (all available at www.nature.com/outlook/ 
masterclass2015). 

We are pleased to acknowledge financial support from 
Mars, Incorporated in producing this Outlook. As always, 
Nature has sole responsibility for all editorial content. 


Anna Petherick 
Contributing editor 
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The equation describing the diffraction limit is engraved on a monument to optical physicist Ernst Abbe in Jena, Germany. 


CELL IMAGING 


Beyond the limits 


Powerful super-resolution microscopes that allow researchers to explore the world at the 
nanoscale are set to transform our understanding of the cell. 


KATHERINE BOURZAC 


yche Mullins dons a pair of Oculus Rift 
D 3D goggles and is transported from a 
cubicle at the University of California, 
San Francisco, into a virtual world. Through 
the virtual-reality headset, Mullins watches 
an immune cell that appears to be the size 
ofa child. The cell crawls through a maze of 
collagen fibres. Mullins can move around the 
image, ‘pushing’ in so that the neutrophil is 
crawling directly above him, or turn the image 
around to watch the cell’s motions from differ- 
ent angles. 
This is not a simulation. The video was made 
using a cutting-edge microscope invented by 
Eric Betzig, an engineer at the Howard Hughes 


Medical Institute's Janelia Research Campus in 
Ashburn, Virginia. Foundational work in cell 
migration is based on studies done in the 1950s 
and 1960s in 2D, at low resolution and on glass 
slides. Seeing cells more clearly, says Mullins, 
who is a molecular biologist, is helping to over- 
turn received wisdom about the fundamentals 
of how cells move, whether they are immune 
cells moving through tissue or amoebae in a 
pond. With new imaging tools, he can watch 
their movement in vivid detail. 

These tools are redefining microscopy’s 
frontier, and their potential impact on biology 
is huge. Until the development of super-reso- 
lution microscopes, which began in 2000, life’s 
underlying molecular world appeared as a blur, 
only hinted at by relatively low-resolution light 
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microscopy or captured in crisp, but static sin- 
gle-frame electron micrographs. “Now we have 
the proof that these barriers can be overcome,” 
says Stefan Hell, director at the Max Planck 
Institute for Biophysical Chemistry in Got- 
tingen, Germany. “We're just at the beginning 
of what’s possible.’ Hell, a newly minted Nobel 
laureate, gave a lecture on the subject this year 
at the 65th Lindau Nobel Laureate Meeting in 
Germany, rousing excitement among his fellow 
laureates and the young scientists in attendance. 

Hell was jointly awarded the Nobel Prize 
in Chemistry in 
2014 with Betzig 
and William 
Moerner for their 
development of 


> NATURE.COM 

For an animation about 
Stefan Hell and his work 
visit: go.nature.com/hhrbee 


DANIEL MIETCHEN 


super-resolved fluorescence microscopy. 
Before their work, the nanoscale details of 
life in motion went unwitnessed. That has 
been a big blind spot. Life is dynamic: cells 
not only crawl around, like Mullins’ neutro- 
phil, they divide, shuttle chemicals from one 
internal structure, or organelle, to another, 
and so much more. Aside from cell migration, 
researchers are using super-resolution micro- 
scopes to look at what happens when a group 
of viruses attacks a cell in real time, and they 
are turning these instruments on the chemi- 
cal connections between neurons in the brain. 
“This technology will be transformative,’ says 
Tomas Kirchhausen, a cell biologist at Harvard 
Medical School in Boston, Massachusetts, and 
another early adopter of these tools. “I’m ready 
to put my other instruments on eBay.” 


A SHARPER IMAGE 

Hell was the first to break the existing limits 
on light microscopes. But not too long ago, 
he told the Lindau audience, no one took him 
seriously (a common remark made by Nobel 
prize winners). Ideas about the limits of imag- 
ing were not only set in scientists’ minds, but 
literally set in stone. On the lecture screen, Hell 
showed a photograph of a monument to one 
of his scientific forebears, Ernst Abbe, in Jena, 
Germany — a stone engraved with the equa- 
tion describing the diffraction limit. 

Abbe rose from humble beginnings to 
become a brilliant optical physicist at German 
microscope manufacturer Zeiss. He was the 
first to understand how images are made in 
light microscopes, explained Hell, and he was 
able to make the best instruments. Abbe fig- 
ured out that when a beam of light is focused 
ona spot whose size is smaller than about half 
its wavelength, it starts to interfere with itself, 
and the image becomes blurred. If two objects 
are closer together than that distance — a cou- 
ple of hundred nanometres — they cannot be 
resolved under a conventional light micro- 
scope no matter how perfect the lens. This is 
Abbe’s diffraction limit. 

It is in large part thanks to Abbe’s work that 
light microscopes have been powerful research 
tools. “I have huge respect for Abbe,” says Hell, 
who named his own microscopy company, 
based in Gottingen, Germany, Abberior. Abbe 
was not wrong, he was a product of his time. In 
1873, when he published his work on the dif- 
fraction limit, the molecule was an unproven 
idea and making better microscopes was all 
about developments in optical science. 

Hell realized that chemistry could take 
things further. Below the diffraction limit is 
a rich microbiological world that includes 
viruses, proteins and the fine details of orga- 
nelles such as the energy-generating mito- 
chondria. Hell was determined to bring this 
world to light, and became obsessed with 
breaking the diffraction limit. He focused on 
a technique called fluorescence microscopy, a 
common tool used by biologists. The method 
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A live cell captured by a type of microscopy developed by Nobel prizewinner Eric Betzig. 


is a form of light microscopy, augmented by 
fluorescent chemical labels. These bright red, 
green or other coloured tags attach to the spe- 
cific molecules that a biologist is interested in 
and act as beacons to help locate particular 
types of cells or structures within cells. 

In the 1990s, Hell read an article about a 
quantum mechanical phenomenon called 
stimulated emission — a way to control the 
fluorescence of these labels — and was “electri- 
fied”. He realized that imaging could be about 
more than just great lenses — he could use 
chemistry to get beyond the diffraction limit. 

Hell designed a system that used two beams 
of light, one to stimulate fluorescent molecules, 
and another that immediately turned most of 
them off in such a way that only those at the 
centre of the light beam continued to shine’. 
The technique, which he called stimulated 
emission depletion (STED) microscopy, does 
not break the diffraction limit but, as Hell puts 
it, plays a game to get around it. The resulting 
cylindrical beam — shaped like the outline of 
a donut with the inner circle filled in — is then 
scanned over the sample to get the full picture 
(see ‘Under the microscope’). Hell’s first super- 
resolution microscope in 2000 only exceeded 
the diffraction limit by around a factor of two, 
but it brought the barrier down. 

Hell is a self-described dreamer, and felt 
shut out of the mainstream of science before 
his Nobel-winning discovery. Whereas Betzig 
cultivates the aura ofa cranky outsider, and in 
his talk at Lindau this year, he made it clear 
that he has no patience for the strictures of 
academia. Betzig calls himself a “tool-builder” 
— an engineer who wants to build the best 
microscopes possible. 

Betzig was also long-obsessed with sur- 
passing Abbe’s diffraction limit. Working on 
near-field microscopy at renowned research 
centre Bell Labs in Murray Hill, New Jersey, 


in the early 1990s, he made advances on a 
technology called near-field imaging. But he 
grew frustrated with research on what he did 
not consider to be a very useful tool. Near-field 
microscopy technically beats the diffraction 
limit, but it is not very practical because it uses 
a sharp imaging tip that must be extremely 
close to the sample. Betzig left research for sev- 
eral years and worked at his father’s company, 
designing machine-tooling equipment. After 
inventing a design that the company could not 
sell, he had what he calls his second midlife 
crisis. He kept up-to-date with the scientific 
literature, and reading about green fluorescent 
proteins brought him back to science — into 

the same sphere in 


“The technology Which Moerner was 
will be . oe laeaae - ad 
transformative. ‘°° 2¢ teturnec to the 
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eBay.” tion, Betzig’s relies 


on chemistry, in this 
case a phenomenon 
called photoswitching that Moerner was 
studying*. Working at IBM in San Jose, Cali- 
fornia, Moerner was the first to measure the 
absorption spectrum ofa single molecule’. 
Then, in 1997, he went on to show that cer- 
tain fluorescent molecules could be turned 
on and off with a beam of light. The underly- 
ing physics are different from STED, but the 
implication is the same: being able to turn 
fluorescent molecules on and off as though 
they were light bulbs. 

Hell’s microscopes raster a narrow beam 
over a sample to fill in the image zone by zone, 
whereas Betzig’s work combines several faint 
snaps of the whole imaging zone to make a 
complete picture. In the latter case, before each 
image capture, the microscope illuminates 
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Xiaowei Zhuang developed the super-resolution microscopy method STORM. 


the sample with a weak beam of light that 
only turns on a small fraction of the fluores- 
cent molecules in the sample. This is repeated 
again and again until all of the molecules have 
been located. Even if two molecules are closer 
together than the diffraction limit, they will 
appear in different sub-images, so they can all 
be seen in composite. 


CELLS IN FOCUS 

Biologists trying out the new technologies are 
amazed by what is possible. In the seventeenth 
century, before he examined a slice of cork 
that he had put under his microscope, Robert 
Hooke did not know that cells existed — no one 
did. But there they were: “... indeed the first 
microscopical pores I ever saw, and perhaps, 
that were ever seen, for I had not met with any 


Writer or Person, that had made any mention 
of them before this,” he wrote in 1665. Stories 
from early users of super-resolution imaging 
are reminiscent of Hooke. “Super-resolution 
is not just about getting sharper pictures,” says 
Xiaowei Zhuang, a biophysicist at Harvard 
University in Cambridge, Massachusetts, who 
developed another super-resolution method 
called Stochastic Optical Reconstruction 
Microscopy (STORM), which she and her col- 
leagues published in 2006 (ref. 4). “You can dis- 
cover things that have never been seen before.” 

Using STORM to look at the structural pro- 
tein actin in neurons in 2012, she saw some- 
thing surprising and new. Zhuang is interested 
in synapses, the electrochemical connections 
between neurons that are the basic units of 
brain circuits. She had asked her postdoctoral 
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researchers to image the structure of synapses 
by mapping out actin at high resolution. But 
the most noticeable feature of the pictures 
turned out to lie next to the synapses: in the 
images, the axons — the slender projections 
that transmit neurons’ outgoing signals — 
looked odd. 

All along the axons, the team saw regu- 
larly spaced rings of actin. “It’s a remarkable 
periodic structure — every 180 nanometres, 
an actin ring,” says Zhuang. “It’s beautifully 
laid out, it almost looks like the structure 
of a crystal.” The features of this pattern are 
under the diffraction limit, and so the rings 
had been previously invisible. Under a con- 
ventional fluorescent microscope you can see 
a smear indicating the presence of actin, but 
the periodic rings, so distinct under Zhuang’s 
STORM microscope, are a blur. Nor had any- 
one noticed the rings under an electron micro- 
scope because other filamentous structures in 
the cell obscure them, she says. 

Zhuang has found that the actin rings are 
present in all axons, but only in patches ona 
small fraction of dendrites, the neural projec- 
tions that receive signals. The ‘axonal skeletor’ 
helps axons to maintain their mechanical sta- 
bility and conduct impulses. This understand- 
ing is supported by other studies’, which show 
that worms genetically engineered to lack 
the protein that connect the rings have frag- 
ile axons, impaired movement and a reduced 
response to mechanical stimuli. 

Harvard's Kirchhausen has had similarly 
eye-opening moments using a microscope 
built with Betzig, which uses thin sheets of 
light to illuminate one planar slice of a sample 
at a time and image it using Betzig’s super-reso- 
lution method. It is the most recent example of 
Betzig’s work. Kirchhausen has spent decades 
studying the formation of membrane bubbles 
called vesicles that are just 30-80 nanometres 
in diameter. Vesicles carry chemicals around 
the cell and also help the cell engulf, transport 
and break down unwanted molecules and 
microbes. In 2004, Kirchhausen completed 
the first image of the structure of the pro- 
tein complex that pinches off vesicles from a 
membrane’. He used electron microscopy to 
generate images with near-atomic resolution, 

going down to 0.8 
ner_-resolu ,  MNanometres. But 
ae although he was 

Ra aie making progress 
in determining 
the structure of 

vesicles, he says, “I 

realized I should be 
paying attention 
to the dynamics.” 
Using one of Betzig’s machines, he was finally 
able to watch the process. Over a period of 
30-90 seconds, hundreds of molecules assem- 
ble into a kind of geodesic dome that pulls out a 
bubble of membrane. Then they all fall away to 
leave the vesicle. Kirchhausen can now watch 
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UNDER THE MICROSCOPE 


Our understanding of cells such as neurons has benefited 
from advances inimaging. By Katherine Bourzac. 


LIGHT MICROSCOPY SIXTEENTH CENTURY 


The first light microscopes used 
relatively simple arrangements of 
lenses to magnify tissue; later versions 
used external illumination and stains to 
make features even clearer. The original 
inventor is disputed, but the first 
microscopes are thought to have 
emerged from work on telescopes. 


ELECTRON MICROSCOPY 1933 


Physicist Ernst Ruska created the 
first microscope to form an image 
by bouncing beams of electrons off 
a sample. He shared the 1986 
Nobel Prize in Physics for this work. 
Electron microscopes can resolve 
atomic-scale detail — as static, 
structural images — because the 
wavelength of an electron is much 
shorter than that of light. 


FLUORESCENCE MICROSCOPY 1990s 


This technique allows researchers to 
locate parts of the cell using chemistry. 
Molecules of interest can be labelled 
by attaching fluorescent tags to 
proteins, and gene expression can be 
monitored by inserting a fluorescent 
protein gene into the genome (a 
method recognized by the Nobel Prize 
in Chemistry in 2008). 


SUPER RESOLUTION 2000 onwards 


The latest microscopes take 
resolution to the nanoscale. The 
subject of the 2014 Nobel Prize in 
Chemistry, super-resolution 
microscopy zooms in on individual 
molecules and subcellular 
structures by computationally 
combining many fluorescence 
images. Stimulated emission 
depletion, or STED, microscopy is 
an example of this technique. 


Axon 
terminal 


Myelin 


Schwann 
cell 


Nucleus 
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Under a light microscope, 
neuroscientist Santiago 
Ramon y Cajal saw the 
delicate branching of 
dendrites. He shared the 
Nobel Prize in Physiology 
or Medicine in 1906 with 
Camillo Golgi for their 
work on the structure of 
the nervous system. 


Electron microscopes 
revealed the branching or 
‘spines’ on the dendrites of 
neurons for the first time. 
In 1959, Edward George 
Gray published images of 
dendritic spines and 
established that the 
connections between 
neurons (synapses) occur 
at these sites. 


Fluorescent imaging advanced 
our understanding of the cell 
by revealing how the protein 
actin helps in cell growth and 
migration. Labelling actin with 
fluorescent dyes, researchers 
saw that it propels the leading 
edge of growth in axons, which 
carry neurons’ outgoing 
electrochemical signals. 


In 2013, biophysicist 
Xiaowei Zhuang used 
super-resolution 
microscopy to show that 
axons have rings of actin 
along their length, like 
tiny, regularly-spaced 
belts. These belts are 
believed to provide 
structural support. 


Optical 
microscope 


LIFE IN FOCUS 


Under a conventional 
optical microscope, the 
resolution is limited to a 
few hundred nanometres, 
and the details of a 
mitochondrion are blurry. 
STED microscopy uses an 
exciting beam to image a 
sample and a quenching 
beam to cancel out part of 
the beam, narrowing it 
and increasing the 
resolution. 


STED 
microscopy 


Exciting : 
laser beam Quenching 


laser beam 


At a resolution of tens of nanometres, 
the features of the mitochondrion 
become visible, revealing the folds of 
the organelle’s internal structure. 


Exciting 
laser beam 
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Stefan Hell was jointly awarded the Nobel prize for his work on super-resolution microscopy. 


this happen thousands of times a minute, 
throughout a cell. 

As soon as the super-resolution microscope 
was up and running in 2014, Kirchhausen 
started using it for various other projects 
requiring super-resolution in 3D. “We sprinkle 
viruses on a cell and watch how they’re taken 
in,” he says. “Now we can track every virus as 
it enters the cell and follow its fate.” Which 
viruses succeed? Which don't? And what hap- 
pens as a virus enters a cell? Those are ques- 
tions Kirchhausen can now ask. 


CHALLENGES AHEAD 

Using super-resolution microscopes is not like 
working an old-fashioned light microscope. 
“There are no binoculars. Everything is done 
in the computer,’ says Kirchhausen.“You can 
see a crude version of what is coming off the 
microscope, but it takes some time afterwards 
to render it into something useful? says Mul- 
lins. These methods are based on imaging huge 
numbers of single molecules at a time. That 
is something that neither the human eye nor 
brain can make sense of unless powerful soft- 
ware has processed it first. 

Most image-processing software has been 
designed to work with 2D, low-resolution 
images, explains Mullins. For his work, he 
collaborated with specialists in processing 3D 
images (previously for computer models of 
individual molecules such as proteins) at the 
University of California. The video of the neu- 
trophil crawling through a couple of microme- 
tres of tissue was acquired in just 3 minutes ina 
single experiment that yielded 25 gigabytes of 
data. A typical high-definition Hollywood film 
is about one-tenth of that size. Data sharing 
and storage are becoming a problem. 

Hardware is another barrier to using super- 
resolution microscopy. Moerner is using 
super-resolution microscopes based on both 


Betzig and Hell’s designs in his lab. His set-ups 
are complex. A square metre or so of about 
30 precisely placed optical components direct 
the imaging light. The microscopes have to be 
housed on a vibration-isolating table that is 
common in physics departments, but not in 
wet labs. 

Such vulnerability is not ideal. Companies 
including Zeiss and Hell’s start-up are design- 


ing commercial models 
that put more compact, “[ fugi] J 
fixed versions of these watched 
set-ups inside a case. So this, Ididn’t 
, 

far the models on the amderstana 
market have not been 

the psychology 


very good, says Betzig, 
but that will change. 

This move towards 
commercialization does not signal that the 
fundamental research is slowing down. Hell’s 
dream now, he says, is to map every protein 
in living tissue in real time at a resolution of 
1 nanometre, without causing the tissue any 
damage. Zhuang’s dream is similar. One goal 
in her lab is to follow the waxing and wan- 
ing of gene expression at the genome scale in 
individual cells — that would mean imaging 
the full orchestra of messenger RNA in cells 
at the same time. This April, she published 
results from her first attempt, which used a 
combination of single-molecule imaging and 
computational savvy to simultaneously detect 
about 1,000 different kinds of RNA molecules 
in one cell’. Because she is interested in the 
brain, Zhuang particularly wants to watch how 
neurons develop, to understand how their dif- 
ferent behaviour emerges from gene expres- 
sion patterns. 

Another of Zhuang’s goals is to track com- 
munication between neurons at their syn- 
apses, to watch these cell-to-cell chemical 
communications just as electrophysiologists 


of the cell.” 
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can measure a single neuron’ electrical sig- 
nals. One of her projects is to follow computa- 
tion in individual neurons. Currently, no one 
fully understands how a neuron integrates 
incoming signals and then generates an out- 
put. Once that question can be answered at 
the level of a single cell, observations can be 
turned towards figuring out the processes of 
computation in small circuits, and eventually 
in the whole brain. 

Imaging individual cells with these meth- 
ods is now possible, but it is more difficult to 
take speedy, single-molecule resolution vid- 
eos at the tissue level, for example in the brain. 
Today’s systems are held back by the brightness 
and switching response of existing imaging 
dyes, and the limited number of colours that 
can be used at once. 

To help them advance the field, Zhuang, Hell, 
and other imaging researchers are looking to 
chemistry. “The next frontier is going to be new 
dye colours,’ says Luke Lavis, a chemist at the 
Janelia Research Campus who develops fluo- 
rescent imaging tags. Creating dyes that attach 
to biomolecules in ways that disrupt their func- 
tion as little as possible — and making a whole 
rainbow of dyes with colours distinct enough 
to be separated during image processing — 
will make it possible to view more moving 
molecules simultaneously. And making dyes 
brighter, or designing them so that they will 
not fluoresce until they attach to their target, 
might stop cells from becoming badly damaged 
by repeated blasts of high-energy-stimulating 
light. Lavis is working on it. “When I started 
out, people would be using synthesis methods 
reported in 1888, but chemistry has evolved so 
much since then. Now we can really explore the 
structures of these dyes,” he says. 

Watching Mullins explore the subcellular 
world in his 3D goggles, it is clear that the 
details unveiled by today’s microscopes would 
astound researchers such as Abbe and Hooke. 
Nonetheless, super-resolution imaging is not 
all about seeing and quantifying cells as col- 
lections of molecules. “Until I watched this, I 
didn’t understand the psychology of the cell,” 
says Mullins, only half joking. In this footage, 
the cell is not an abstract object. It responds 
to mechanical forces and it pushes back. The 
cell has a palpable physical presence. And that, 
compared to all means of watching it before, 
somehow makes it seem more alive. m 


Katherine Bourzac is a science journalist 
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Q&A Susumu Tonegawa 
Memory man 


Susumu Tonegawa unlocked the genetic secrets behind antibodies’ diverse structures, which 
earned him the Nobel Prize in Physiology or Medicine in 1987. Having since moved fields, he tells 
Keikantse Matlhagela about his latest work on the neuroscience of happy and sad memories. 


You started as a chemist, then you moved 

into molecular biology and now you area 
neuroscientist. Why change fields? 

Strangely, the only people to ask me about 
this are journalists — my students never ask. 
I see myself as a scientist who is interested in 
what’s going on inside of us. It doesn’t mat- 
ter whether it is chemistry or immunology 
or neuroscience, I just do research on what I 
find interesting. The switch from chemistry to 
immunology did not seem like a big shift when 
I was young, but immunology to neuroscience 
was. After about 15 years spent researching 
immunology I wanted to explore an area of 


science where there are still big, unresolved 
questions. The brain is probably the most 
mysterious subject there is. 


Do you keep up to date with the field in which 
you won your Nobel prize? 

Iam sorry to say that I haven't been paying a 
lot of attention to immunology in recent years 
because I am preoccupied with my work on 
memory. I have friends, of course, from that 
time — very close friends. But my friends 
are not young. Even though they are experts, 
they are also retired. We tend not to talk about 
immunology a whole lot. 
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Is it helpful in neuroscience research to have a 
multidisciplinary background? 

The brain is hugely complicated, and because it 
is so complicated it requires multidisciplinary 
research. You need mathematics to model how 
brain networks function. You need chemistry, 
molecular biology and behavioural studies of 
animals to answer other neuroscience ques- 
tions. Neuroscience is totally multidisciplinary. 


What have you learnt about the brain circuits 
involved in positive and negative emotions? 
Imagine that a week ago you were on a vaca- 
tion — you went to a Caribbean island and had 
a great time — and you remember the detail of 
what happened during your vacation. Those 
memories would be examples of ‘episodic 
memory. Sometimes episodes come with no 
emotional content, but often they come with 
a positive or negative slant — in other words, 
they were either pleasurable or unpleasant 
experiences. 

My lab has been studying the part of the 
brain called the hippocampus to investigate 
its role in the formation of episodic memory, 
and how that varies with positive and nega- 
tive emotional content. Our results indicate 
that there is a kind of competition between 
brain circuits to be able to assign a positive or a 
negative value to the memories. We have taken 
advantage of this understanding in our experi- 
ments on mice and have shown how depres- 
sion can be reversed or repressed. 


How do you tell if a mouse is depressed? 

It is similar to the way in which you tell if a 
human is depressed. There are at least two 
symptoms. When a depressed patient encoun- 
ters something difficult to resolve or improve, 
they give up more easily than people without 
depression. Another symptom is called anhe- 
donia, which is an inability to enjoy normally 
pleasurable experiences. Therefore, depressed 
people don’t seek out normally pleasurable 
experiences. The same goes for mice with 
depression. 


Have you identified a target protein or group of 
proteins involved in mouse depression? 
No. But we have found specific target cells that 
hold pleasurable episodic memories. They are 
deep inside the brain. There is an area of the 
brain that in male mice, for example, holds 
information abouta specific, playful encounter 
with female mice. We have developed technol- 
ogy to identify these cells and we have geneti- 
cally manipulated mice so that the cells express 
a light-sensitive protein. If you shine blue light 
onto them the cells become activated, meaning 
that the mouse recalls the positive experience. 
Going back to the depression model: if a mouse 
is depressed, activating 
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you put the mouse in a difficult situation that 
it would have previously given up on, it now 
makes an effort to solve the problem. 


Have you tested this in higher animals? 

Not yet. Unfortunately, the technology that 
we use for mice is not directly applicable to 
humans. But I know researchers are working 
on how to replicate in humans something 
similar to what we did in mice and eventually 
we can expect that this will become possible. 
Hopefully, one day, our findings will lead toa 
new kind of therapy for depression. 


lam a researcher from southern Africa, 
where investment in science is low. What 
career advice would you give to young 
scientists who come from countries that are 
not known for their science research? 

When I was a student in Japan the 1960s, I 
became fascinated by molecular biology. 
But there was no molecular biology in my 
country. So I had to go to graduate school 
abroad. I was fortunate to have the opportu- 
nity to study in the United States and I stayed 
abroad, including in Europe. My antibody 
work was entirely done in Switzerland. I 
didn’t have much ofa relationship with Japan 
until very recently. But now, after many years, 
I try to help Japanese science. 

For a while, you may have to be trained 
outside of Africa. Since you are young, if you 
really want to do science I think there will bea 
way for you to go abroad and receive training. 
Then you could go back to southern Africa 
and try to help science in your country; if 
you get good training your knowledge will 
become useful not only to you, but eventually 
also to the community in which you grew up. 


What are your interests outside of science? 

I do not have anything that interests me as 
much as science. But, later in my life I was 
introduced to music by my wife and our chil- 
dren. I go to concerts with them and I enjoy 
it. My daughter plays the violin quite well. My 
younger son played cello and piano. Iam not 
sure what I will focus on if I stop working in 
the lab. Once in a while I wonder what Iam 
going to do ifand when I retire. Is there sucha 
thing as retirement for me? I can’t imagine it. m 


This interview has been edited for length and clarity. 


Keikantse Matlhagela is an HIV 
researcher and 
lecturer at the 
University of 
Botswana’s Faculty 
of Medicine. 
Previously, she was 
a Fogarty Research 
Fellow at the 
Harvard School of 
Public Health. 
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Q&A Elizabeth Blackburn 
End-game winner 


Elizabeth Blackburn shared the 2009 Nobel Prize in Physiology or Medicine with Carol Greider 
and Jack Szostak for their work on telomeres — the protective caps at the end of chromosomes 
— and for identifying the enzyme telomerase, which maintains telomere length. Now at the 
University of California, San Francisco, she offers Elena Tucker an insight into her life inside 
and outside academia. 


Why did you choose to study the ends of 
chromosomes? 

The driver for me was wanting to understand 
how life works, rather than solving a particu- 
lar problem that afflicts humans. When I was 
finishing my doctoral work at Fred Sanger’s lab 
in Cambridge, UK, DNA-sequencing meth- 
ods were embryonic. At the time, in the early 
1970s, it was hard to sequence DNA except at 
the ends of relatively short DNA molecules. 
That was just what was possible. Research- 
ers had looked at the DNA of viruses such as 
bacteriophages, but I wanted to know what 
goes on inside the nuclei of cells with real 
chromosomes. I heard that Joe Gall at Yale 
University in New Haven, Connecticut, had 
discovered very short, linear chromosomes in 


2015 
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the cell nucleus of the eukaryotic protozoan 
Tetrahymena, and I thought I might be able 
to sequence the ends of those with the limited 
technology available at the time. 


How did your research get off the ground? 
Once at Yale I started analysing the ends of 
the short Tetrahymena chromosomes. What 
I found was really odd. I had expected to see 
something similar to the single-strand over- 
hanging DNA observed at the ends of bac- 
teriophage linear DNA, but telomeres were 
different. They 
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of these repeats varied between molecules 
and over time. This changeability was a really 
strong clue of enzyme involvement in main- 
taining them. Once I had my own lab at the 
University of California, Berkeley, I extended 
the telomere research to yeast, collaborating 
with Jack Szostak at Harvard Medical School 
in Boston, Massachusetts. We demonstrated 
that this changeability is not unique to one kind 
of organism, and that it might be universal to 
eukaryotes. So I started searching for a new 
kind of enzyme. That was when my then-grad- 
uate student Carol Greider and I discovered 
telomerase. 

We also found that we could make cells quite 
miserable if we disrupted their telomeres. The 
molecular puzzles became more and more 
intriguing. Moving to the University of Cali- 
fornia San Francisco with its medical school 
has led me towards all sorts of systems-wide 
questions around human health and telomeres. 
I started collaborations to study how chronic 
stress can affect physiology, which I would 
never have dreamed of asking on my own. 


Why did you decide to enter, and then exit, the 
start-up world? 

Telomere length is not usually by itself diag- 
nostic, but more a general measure of health. 
Because of the correlation between long 


telomeres and a long and healthy life, over the 
years people have realized that there may be 
some benefit in measuring the length of their 
telomeres. Our lab has been quite good at that 
ever since telomeres have been on the radar. 
We realized that one day, just as you might 
send off a DNA sample for sequencing, you 
might also send a sample to a company that 
could measure your 


telomeres. I set up “We found that 
Telome Health with a we could make 
couple of others, but cells quite 

when it started going iniecrabie ifwe 
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thought were not 
scientifically driven 
I gave away all my 
equity to the University of California — as 
did my co-founder Elissa Epel. All in all, the 
adventure of starting something and talking 
with people in the finance world was a positive 
experience. I learned to admire the skills that 
business people need, realizing that scientists 
have a lot to learn from their world. 


telomeres.” 


What are the main unanswered questions 
involving telomeres? 

I would like to know how a telomere really 
works. We have a parts list, so we know what 
it does in a static sense. But in live cells, tel- 
omeres are extraordinarily dynamic. They are 
complex little ecosystems that constantly have 
proteins arriving and leaving every second. I 
think we could learn a huge amount by study- 
ing telomeres in action, rather like researchers 
do by watching active ribosomes assemble pro- 
teins, in addition to knowing their structure. 
The second unanswered question is how we 
might be able to modify telomere maintenance 
to make human bodies more resilient, given 
the correlation between having long telom- 
eres and living healthily and for a long time. I 
would love to see humanity achieve a situation 
in which, although we get old and decrepit and 
things go wrong, we can improve health in the 
elderly far more than is achieved today. 


What happened during your time on the 
President’s Council on Bioethics, which was 
established by former US president George W. 
Bush and disbanded by Barack Obama? 
When you are a scientist, you are part of this 
community so you serve on a lot of commit- 
tees, and sometimes they are National Advi- 
sory Commissions. I knew the Bush council 
had political implications when I joined. I kept 
being a nag about getting the science right in 
reports, and they threw me off! That got a lot 
of attention because it occurred in the context 
of other developments that spoke to the issue 
of the use of evidence in policymaking. There 
were many things at stake. For example, some 
in government at that time minimized the 
evidence for climate change. I do believe that 
although there might be other considerations 
besides scientific evidence in policymaking, 
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politically impartial research should always 
be the bedrock of policy decisions. 


Apart from your Nobel prize, you’ve received a 
UNESCO LOreal for Women in Science Award 
and been one of Time magazine’s 100 most 
influential people. What achievement are you 
most proud of? 

What I’m truly proud of is having done the sci- 
ence and I'm proud of the people who work with 
me — I’m proud that we've done these things 
together. The awards are really just symbols. But 
in the sense that symbols influence people in 
subtle ways, they matter. If ’m photographed 
receiving an award, the picture makes the point 
that there's a woman winning a science prize — 
that aspect of an award is important to me. 


How has your experience of motherhood 
influenced your career and vice versa? 

For years I honestly didn’t think much about 
having children. People would often say “youd 
better have a baby soon if youre turning 30”, 
but I didn't listen. I became pregnant in my late 
thirties when my career was very much under- 
way — I found out I was pregnant in the same 
week that I was promoted to full professor at 
Berkeley! I feel very lucky, but I don't think my 
path is necessarily a recipe for happiness that 
others should follow. Children can happen at 
any time, and the challenges will be different 
depending on where you are in your career. 


When you were a child, did your interest in 
science make you feel different from your 
peers? 

I always felt like a fish out of water. I had good 
friends growing up and we would do stuff that 
was normal in Tasmania, Australia, like swim 
after school and then eat disgustingly delicious 
meat pies slathered with tomato sauce. But I 
wouldnt yap to my friends about science. I 
would talk about Beatles songs. I wasn't afraid 
of being thought weird, though talking sci- 
ence seemed to come across as pretentious. 
No one was overtly nasty, but I did sometimes 
feel pushed away, and that was hurtful. Self- 
preservation is a useful life lesson. I suppose I 
always knew that I was different. I was thrilled 
with the idea that I could go and do a PhD, 
and do it outside Australia to really expand my 
horizons. = 


This interview has been edited for length and clarity. 


Elena Tucker is a 
Peter Doherty 
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rare human 
disorders. 
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Q&A Richard Roberts 
Microbe cheerleader 


Richard Roberts shared the 1993 Nobel Prize in Physiology or Medicine with Phillip Sharp 

for their discoveries of split genes, which contain parts that encode protein, called exons, and 
gaps between them, called introns. Now chief scientific officer at New England Biolabs based in 
Ipswich, Massachusetts, Roberts talks to Gijsbert Werner about microbes, genetically modified 
food and the problem with Nobel prizes. 


In your Lindau lecture this year you talked 
about genetically modified organisms (GMOs). 
Are people right to worry about them? 
Frankly, they are not. We have been geneti- 
cally modifying everything we eat for more 
than 5,000 years. We have been improving 
plants by ‘natural’ breeding since the origin of 
agriculture. When we breed plants, we make 
hybrids — and typically move hundreds of 
genes from one plant to another. You don't 
know what those genes are. You don't know 
where they go. And you don't know how these 
genes are influenced by moving them. Genetic 
engineering is just a better way of doing what 
we have been doing for the past 5,000 years. 
The argument that inserting bacterial genes 
into plants is a break with the past is invalid 
because, to pick an example, there is very 
good evidence that the sweet potato genome 


contains bacterial genes. It doesn’t make sense 
to think that new methods of altering plant 
genomes will be inherently dangerous. Genes 
are genes; it is what they do that matters. We 
need to test whether the products are safe, not 
worry about the process of creating them. This 
argument extends to the potential ecosystem 
effects of GMOs. I do worry about ecosystems, 
but there is no special risk to them from plants 
created using these new methods. 


One of your main interests is microbes — 
indeed you gave a lecture about why we 
should love them at Lindau last year. Why did 
you feel this was necessary? 

The vast majority of the microbes that live with 
us are good. But bacteria have a bad reputation 
because science has focused on the ones that 
cause disease. Biologists are finally starting to 
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realize that by manipulating and controlling 
microorganisms, we can probably do more for 
human health than by any other means. The 
nice thing about this kind of medicine is that it 
would be cheap. We should explore all sorts of 
ways to make bacteria more beneficial, includ- 
ing genetic engineering. If you can cure disease 
by manipulating the microbiome, that is going 
to save a lot of money and will probably also 
teach us how to live better. I love bacteria. 


Has biotechnology focused too much on the 
health of the human host without considering 
its microbial colonizers? 

I absolutely think we have gone overboard in 
studying humans as humans. We need to study 
good bacteria in the context of their human 
ecosystems. Until recently, microbiologists did 
almost no work on good bacteria, which means 
that these organisms are under-appreciated 
even though they are an incredibly important 
part of us. That is a big mistake. The average 
human contains two to five pounds of bacte- 
ria! They provide protection against pathogens 
and prime our immune systems. IfI were to kill 
all the bacteria that live in or on you, you would 
probably die. It is as simple as that. We know 
this because bacteria-free individuals of other 
species die young. 


Why are you so passionate in your support of 

GM food? 

I feel that scientists need to provide more legiti- 
macy to GMOs. A lot of people cannot grasp the 
nuances of the relevant science, but respect and 
listen when prominent scientists — particularly 
Nobel laureates — speak up. I want to make sure 
the general public receives the benefits of GM 
food, but also understands its limitations. The 
fabrications that the anti-GMO people have 
used to scare the population worry me very 
much. I would really like to convince green par- 
ties of the benefits of GMO. In general, I support 
green parties. I think they just made a mistake 
in opposing GM foods — and they did it not 
because they were against genetic modification 
per se, but because they were afraid that multina- 
tionals were going to take over the food supply. 


New techniques are making gene technology 
available to much smaller organizations than 
ever before. If what the anti-GMO lobby really 
cares about is multinationals taking over, 
might these techniques increase acceptance 
of GMOs? 

The way to think about this is to consider 
evolution as a very slow process. Plants might 
eventually adapt to global warming, but if they 
don't adapt fast enough we wont have enough 
to eat. Genetic modification is a fast way of 
doing things. If we do not interfere and ‘help’ 
evolution where we can, an awful lot of people 
are going to die unnecessarily, particularly in 
the developing world. There are opportunities 
to really get something done here, and there are 
strong moral arguments. And there is no reason 
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why small companies or non-governmental 
organizations cannot make a big impact and 
significantly help the developing world. 


As well as the new GMO initiative, you also 
signed the Mainau Declaration on climate 
change and campaigned in China for the 
release of Nobel peace laureate Liu Xiaobo. 
Do you consider it a responsibility to use your 
Nobel laureate status for the public good? 

A Nobel prize is something rather special. 
Almost all of the laureates here in Lindau were 
awarded a Nobel prize because we were lucky. 
Itis not that we are super smart or better than 
anybody else, but because we made a seren- 
dipitous discovery along the way. For whatever 
reason, when you win a Nobel prize people lis- 
ten to you who never 


“Genetic . listened before. That 
engineering means two things. The 
isJust a be (ter first is that you should 
way of doing use the opportunity to 
whatwehave do good in the world, 
beendoing for _ ifyoucan. The second 
5,000years.” — isthatyou should also 

be careful about what 


you say because you might not always be right. 
There are plenty of issues in which Nobel lau- 
reates could have been helpful, but they were 
rarely politically organized in the past. We 
tried to get Aung San Suu Kyi released from 
house arrest in Myanmar. Even though that 
was not successful, it showed that we laureates 
can come together — 225 of us signed letters 
that were sent to the Chinese and Burmese 
governments. 


What is the future of the Nobel prizes in the 
era of big collaborative science, in the light of 
projects such as ENCODE, the Encyclopedia 
of DNA Elements? 

Many of the major steps forward in biol- 
ogy have been made by individuals or small 
groups of individuals. Our knowledge of biol- 
ogy is so limited, we are still at the starting 
point of understanding how organisms work 
and there are still terrific roles for individuals. 
But, in general, I am not sure science prizes 
are a particularly good thing. They are won- 
derful for the people who win them, and can 
be terrible for those who dont. I think they 
end up causing rather a lot of heartbreak. m 


This interview has been edited for length and clarity. 
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Q&A Bruce Beutler 
Chance encounters 


Bruce Beutler is director of the Center for the Genetics of Host Defense at UT Southwestern 
in Dallas, Texas. He shared one half of the 2011 Nobel Prize in Physiology or Medicine 
with Jules Hoffmann for their work on the activation of innate immunity; the other half of 
the prize was awarded to Ralph Steinman. Here, Beutler talks to Christoph Thaiss about 
biological puzzles and intuition. 


The discoveries that have resulted from 
your work are often referred to as the 
second revolution in immunology — the 
elucidation of how innate immunity 
operates — with the first revolution being 
adaptive immunity. Will there be a third 
revolution? 

I hope there will be third, fourth and fifth 
revolutions. People always seem to over- 
estimate what they already know, and we 


certainly know very little about how the 
immune system functions. If we think of 
the immune system as a machine, then we 
are far from even knowing all of its parts. 
We cannot predict 
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autoimmune disease. And we do not know 
who will and who will not respond to a vac- 
cine. Much more remains to be discovered. 


How have the main challenges in medicine 
changed during your career? 

It is interesting that the most challenging dis- 
eases — cancer, diabetes and Alzheimer’s to 
name a few — have not really changed since 
the late 1970s, when I was in medical school, 
with the exception of emerging infectious dis- 
eases. Some conditions are easier to address 
now, but the major 


issues remain. Auto- 
immunity is probably 
the next frontier. The 
majority of cases of 
autoimmune disease 
result from a complex 
genetic problem that 


“Today, itis no 
longer a problem 
to find mutations 
that cause 
phenotypes, 
which used to be 
a bottleneck.” 


has environmental 

influences. It is a colossal task for the immune 
system to maintain tolerance to self and yet 
be ready to react to everything in the world 
around us. We have some ideas about how that 
works, and we have developed concepts like 
‘central tolerance’ and ‘peripheral tolerance’ — 
the two stages by which the immune system 
learns how to avoid attacking its own body. But 
when it comes to the mechanisms behind these 
concepts, we still know very little. 


What made you decide to use random 
mutagenesis screens — chance, basically — in 
your research? 
Chance leads to discoveries, and mutagenesis 
is a way to enhance one’s chances of finding a 
surprise. Often it is the exceptional observa- 
tions that lead to advances; once you under- 
stand exceptions, you understand the whole 
picture. We used chemically induced random 
mutagenesis of the mouse genome to identify 
genes and their functions. A mutation can cre- 
ate an alternative form ofa phenomenon — a 
phenotype or trait — and we can learn a lot 
by seeing this alternative state. Once I sawa 
mouse with no eyelids. It simply had a mem- 
brane over the eyes. I found it fascinating that 
there is a single gene required for eyelids to 
develop. Similarly, one of the most interest- 
ing phenotypes I have ever seen is found in a 
mouse we called ‘Possum: When Possum mice 
are scruffed at the nape of the neck, they sud- 
denly freeze and go into a sort of trance, for 
want of a better description. For a few minutes 
they do not seem to be conscious, but we know 
from electroencephalograms that they are con- 
scious. The mutation has been identified as a 
defect in a single type of voltage-gated sodium 
channel — sucha simple cause for such acom- 
plicated behavioural phenotype! And remark- 
ably, the channel is not even located within the 
central nervous system. I find that really puz- 
zling. Mutations get you thinking about how 
biological processes work. 

Another thing I love about mutagenesis is 


that it is hypothesis-free. I think we can still 
do good science without having a prediction. 
Ifyou take hypotheses out of the equation, you 
also take away the biases that arise because we 
tend to like our own ideas. If you start with a 
hypothesis and you find that you were correct, 
then you cannot really claim to have been sur- 
prised. On the other hand, if you start with a 
phenotype and find the gene that was damaged 
to create the phenotype, then you can be very 
much surprised by your discovery. 


In that case, how does one develop an intuition 
for an interesting scientific question? 
There is no strict algorithm to follow that 
leads to interesting discoveries. In my experi- 
ence, scientists are guided mainly by instinct. 
In our case, instinct guides the design of 
screens. In prioritizing phenotypes for study, 
it helps to ask questions such as: ‘Is what we 
observe unlike anything that has ever been 
seen before?’ and ‘Does it have implications 
for some important aspect of existing theory?’ 
I get excited by phenotypes that mimic 
human disease. Today, it is no longer a 
problem to find the mutations that cause 
these phenotypes, which certainly used to 
be the bottleneck in the whole process. Now 
it takes us about one hour from first seeing 
a phenotype to finding the causative muta- 
tion, and in my lab we usually solve about 
two phenotypes per day. The difficult part is 
to understand the mechanism, and there we 
have to prioritize our experiments so that we 
learn as much as we can with the resources 
available. 


If there were no technical limitations, what 
would be your ideal experiment? 

I find the speed with which we can already 
sequence all of an organism's protein-coding 
genes just magical. The team in my lab is now 
sequencing about 80 whole exomes — the pro- 
tein-coding parts of the genome — every two 
weeks, and I am not sure we need to improve 
on that much in the future. I feel we have a 
surfeit of ability, so Iam not crying out for 
new technologies in my own area. But looking 
more broadly, I think the great technological 
challenge in medicine in the long term might 
be in pharmaceutical development. One can 
envisage a time when we know the three- 
dimensional structure ofall proteins, and that 
might allow us to compute the structure of 
drugs that would block certain biological pro- 
cesses without having any side effects. It is an 
enormous hurdle, but the day may come when 
computation supplants much of the screening 
we do presently. 


If the scientific system were to be rebuilt from 
scratch, what do you think it should look like? 
It might actually look similar to what we have 
today. Funding has never been a pure meritoc- 
racy, but I do not see a fairer way of doing it, 
practically speaking. In the area of publishing, 
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Ihave flirted with the idea that someday there 
might be no need for peer-reviewed publica- 
tion. Instead, everyone could publish their 
best work on their website. Over time, people 
would learn who the reliable sources were and 
apply alternative ways of ranking performance. 
There is a lot of objection to this approach for 
good reasons. Some are horrified at the pros- 
pect that shoddy work would flourish in the 
absence of peer review. But the model I have 
in mind would be somewhat similar to the way 
that artists are evaluated. Who peer-reviewed 
Bach and his complicated fugues? Imagine 
what we might have lost if they had been 
rejected. 

Institutional organization is also really 
important. We need to maintain a mixture 
of people who work most effectively in small 
groups, as well as people who are at their best 
when part of large, organized efforts. Well- 
coordinated efforts amount to more than the 
sum of their parts. 


If you were not an immunologist, what would 
you be? 

I have always found enormous aesthetic 
enjoyment in nature. I am an amateur pho- 
tographer — I take photographs of birds and 
Ilike to hike. IfI were nota scientist, I might 
bea naturalist. 

But if I were to pick some other field in 
science, I started out as a neurology resident, 
and disorders of higher cortical functions in 
humans still interest me. In the early 1980s, 
when I began seeing patients, the technology 
available to study neurobehavioural disease 
was not nearly as advanced as it is today. 
The opportunities to understand how the 
brain works are much greater now. Having a 
Nobel prize does give me the opportunity to 
broaden my horizons a bit, and I may move 
back into neuroscience one day. 


What characteristics do you look for in 
students? 

They need to have strong verbal abilities, both 
written and spoken. I find this to be a predictor 
of good performance in science in the long run. 
That may sound strange, but it is an observa- 
tion I have made many times over the years. m 


This interview has been edited for length and clarity. 


Christoph A. Thaiss is a PhD student in 
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Q&A Francois Englert 
Boson beginnings 


Francois Englert shared the 2013 Nobel Prize in Physics with Peter Higgs for the theoretical 
discovery of a mechanism that gives mass to subatomic particles. For this work, he collaborated 
with Robert Brout, who died in 2011. He looks back on his contribution to science with 
Thifhelimbilu Daphney Bucher. 


When you were just starting out, what did you 
find interesting about physics? 

Actually, I was an engineer. I started as an engi- 
neer, but I also theorized — and I found that 
more interesting: working out the underlying 
structure of things rather than their practical 
applications. 


What inspired your early theoretical work? 
You have to think back to the situation at the 
time. At the beginning of the 1960s, physicists 
understood long-range forces very well. The 
law of gravity — as described by the general 
theory of relativity — was proposed by Albert 
Einstein in 1915. Physicists also understood 
electromagnetism — the theory of all electric 
and magnetic phenomena, including the prop- 
erties of electromagnetic waves. These include 
light, radio waves, X-rays and y-rays. 
Short-range forces were absolutely not 
understood back then. So this is what Robert 
Brout and I, and shortly after also Peter Higgs, 
initiated — a theory of short-range forces. 


What links a theory of short-range forces with 
the idea of giving subatomic particles mass? 
Long-range forces are mediated by the exchange 
of particles that have no mass. A particle that has 
no mass, like a photon, travels with the velocity 
of light. So the idea in our theory of short-range 
forces was to give particles mass because when 
they have a mass they do not travel at the speed 
of light, they are slower than that. And the forces 
mediated by them become short-range. 


How did you and Brout develop the concept of 
the scalar boson? 

At the time, we were studying the theory of 
phase transition. Phase transition involves a 
change in the physical properties of matter 
—such as water becoming vapour or ice. We 
drew very much from the work of physicist 
Yoichiro Nambu, who worked on phase tran- 
sitions and quantum-field theory and showed 
that they have similarities. This inspired us to 
introduce to field theory some new particles 
called scalar bosons, which have the property 
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of being able to form condensates, as occurs in 
phase transitions. 


How does that give particles mass? 

The condensate of bosons gives rise to a kind 
of sea that pervades the whole Universe. Parti- 
cles travel through that medium, and because 
of the action of the medium, they may slow 
down. And so they acquire mass. It is the inter- 
action of the condensate that gives mass, not 
the individual bosons. 


Are there any differences between your theory 

and the one put forward by Higgs? 

There is a difference in the method. Our one 
— which was inspired by Nambu — is more 
in line with modern physics than the Higgs 
approach. At the time, however, people found 
it more difficult. But our method was well 
chosen; all of the subsequent development of 
theory in the field has been done with it. 


So why is the term scalar boson less well 
known than ‘Higgs boson’? 

That is because of an important paper by US 
theoretical physicist Steven Weinberg, who 
shared the Nobel prize in 1979. In a New York 
Review of Books’ article in 2012 he discusses 
this. I can read you a few sentences. It says: 
“As to my responsibility for the name “Higgs 
boson,” because of a mistake in reading the 
dates on three earlier papers, I thought that the 
earliest was the one by Higgs, so in my 1967 
paper I cited Higgs first, and have done so since 
then. Other physicists apparently have followed 
my lead. But ... the earliest paper of the three I 
cited was actually the one by Robert Brout and 
Francois Englert ... But the name ‘Higgs bosor’ 
seems to have stuck.” 


Does the name of the particle bother you? 
When we finished our calculations about 
scalar bosons, we celebrated because we had 
worked out something that was mathemati- 
cally and logically consistent — not because 
we expected recognition at that time. So I did 
not care about the name. But the problem is 
that it is not correct. Correct labelling would be 
the ‘BEH boson and that would be welcome. I 
do not wish to complain, however — not now 
that I have won the Nobel prize! = 


This interview has been edited for length and clarity. 
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